Commit Graph

6135 Commits

Author SHA1 Message Date
Vladimir Sementsov-Ogievskiy 37e9403ea8 block/io: support int64_t bytes in bdrv_co_p{read,write}v_part()
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_co_preadv_part() and bdrv_co_pwritev_part() and their
remaining dependencies now.

bdrv_pad_request() is updated simultaneously, as pointer to bytes passed
to it both from bdrv_co_pwritev_part() and bdrv_co_preadv_part().

So, all callers of bdrv_pad_request() are updated to pass 64bit bytes.
bdrv_pad_request() is already good for 64bit requests, add
corresponding assertion.

Look at bdrv_co_preadv_part() and bdrv_co_pwritev_part().
Type is widening, so callers are safe. Let's look inside the functions.

In bdrv_co_preadv_part() and bdrv_aligned_pwritev() we only pass bytes
to other already int64_t interfaces (and some obviously safe
calculations), it's OK.

In bdrv_co_do_zero_pwritev() aligned_bytes may become large now, still
it's passed to bdrv_aligned_pwritev which supports int64_t bytes.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-15-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:17:11 -06:00
Vladimir Sementsov-Ogievskiy 8b0c5d7659 block/io: support int64_t bytes in bdrv_aligned_preadv()
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_aligned_preadv() now.

Make the bytes variable in bdrv_padding_rmw_read() int64_t, as it is
only used for pass-through to bdrv_aligned_preadv().

All bdrv_aligned_preadv() callers are safe as type is widening. Let's
look inside:

 - add a new-style assertion that request is good.
 - callees bdrv_is_allocated(), bdrv_co_do_copy_on_readv() supports
   int64_t bytes
 - conversion of bytes_remaining is OK, as we never have requests
   overflowing BDRV_MAX_LENGTH
 - looping through bytes_remaining is ok, num is updated to int64_t
   - for bdrv_driver_preadv we have same limit of max_transfer
   - qemu_iovec_memset is OK, as bytes+qiov_offset should not overflow
     qiov->size anyway (thanks to bdrv_check_qiov_request())

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-14-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:17:11 -06:00
Vladimir Sementsov-Ogievskiy 9df5afbdd1 block/io: support int64_t bytes in bdrv_co_do_copy_on_readv()
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_co_do_copy_on_readv() now.

'bytes' type widening, so callers are safe. Look at the function
itself:

bytes, skip_bytes and progress become int64_t.

bdrv_round_to_clusters() is OK, cluster_bytes now may be large.
trace_bdrv_co_do_copy_on_readv() is OK

looping through cluster_bytes is still OK.

pnum is still capped to max_transfer, and to MAX_BOUNCE_BUFFER when we
are going to do COR operation. Therefor calculations in
qemu_iovec_from_buf() and bdrv_driver_preadv() should not change.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-13-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:17:11 -06:00
Vladimir Sementsov-Ogievskiy fcfd9ade68 block/io: support int64_t bytes in bdrv_aligned_pwritev()
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_aligned_pwritev() now and convert the dependencies:
bdrv_co_write_req_prepare() and bdrv_co_write_req_finish() to signed
type bytes.

Conversion of bdrv_co_write_req_prepare() and
bdrv_co_write_req_finish() is definitely safe, as all requests in
block/io must not overflow BDRV_MAX_LENGTH. Still add assertions.

For bdrv_aligned_pwritev() 'bytes' type is widened, so callers are
safe. Let's check usage of the parameter inside the function.

Passing to bdrv_co_write_req_prepare() and bdrv_co_write_req_finish()
is OK.

Passing to qemu_iovec_* is OK after new assertion. All other callees
are already updated to int64_t.

Checking alignment is not changed, offset + bytes and qiov_offset +
bytes calculations are safe (thanks to new assertions).

max_transfer is kept to be int for now. It has a default of INT_MAX
here, and some drivers may rely on it. It's to be refactored later.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-12-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:16:03 -06:00
Vladimir Sementsov-Ogievskiy 5ae07b1410 block/io: support int64_t bytes in bdrv_co_do_pwrite_zeroes()
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_co_do_pwrite_zeroes() now.

Callers are safe, as converting int to int64_t is safe. Concentrate on
'bytes' usage in the function (thx to Eric Blake):

    compute 'int tail' via % 'int alignment' - safe
    fragmentation loop 'int num' - still fragments with a cap on
      max_transfer

    use of 'num' within the loop
    MIN(bytes, max_transfer) as well as %alignment - still works, so
         calculations in if (head) {} are safe
    clamp size by 'int max_write_zeroes' - safe
    drv->bdrv_co_pwrite_zeroes(int) - safe because of clamping
    clamp size by 'int max_transfer' - safe
    buf allocation is still clamped to max_transfer
    qemu_iovec_init_buf(size_t) - safe because of clamping
    bdrv_driver_pwritev(uint64_t) - safe

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-11-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:16:03 -06:00
Vladimir Sementsov-Ogievskiy 17abcbeee2 block/io: use int64_t bytes in driver wrappers
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, convert driver wrappers parameters which are already 64bit to
signed type.

Requests in block/io.c must never exceed BDRV_MAX_LENGTH (which is less
than INT64_MAX), which makes the conversion to signed 64bit type safe.

Add corresponding assertions.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-10-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:16:03 -06:00
Eric Blake 8024726459 block: use int64_t as bytes type in tracked requests
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

All requests in block/io must not overflow BDRV_MAX_LENGTH, all
external users of BdrvTrackedRequest already have corresponding
assertions, so we are safe. Add some assertions still.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-9-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:14:15 -06:00
Vladimir Sementsov-Ogievskiy 63f4ad1186 block/io: improve bdrv_check_request: check qiov too
Operations with qiov add more restrictions on bytes, let's cover it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-8-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:14:00 -06:00
Vladimir Sementsov-Ogievskiy 801625e69d block/throttle-groups: throttle_group_co_io_limits_intercept(): 64bit bytes
The function is called from 64bit io handlers, and bytes is just passed
to throttle_account() which is 64bit too (unsigned though). So, let's
convert intermediate argument to 64bit too.

This patch is a first in the 64-bit-blocklayer series, so we are
generally moving to int64_t for both offset and bytes parameters on all
io paths. Main motivation is realization of 64-bit write_zeroes
operation for fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

Patch-correctness audit by Eric Blake:

  Caller has 32-bit, this patch now causes widening which is safe:
  block/block-backend.c: blk_do_preadv() passes 'unsigned int'
  block/block-backend.c: blk_do_pwritev_part() passes 'unsigned int'
  block/throttle.c: throttle_co_pwrite_zeroes() passes 'int'
  block/throttle.c: throttle_co_pdiscard() passes 'int'

  Caller has 64-bit, this patch fixes potential bug where pre-patch
  could narrow, except it's easy enough to trace that callers are still
  capped at 2G actions:
  block/throttle.c: throttle_co_preadv() passes 'uint64_t'
  block/throttle.c: throttle_co_pwritev() passes 'uint64_t'

  Implementation in question: block/throttle-groups.c
  throttle_group_co_io_limits_intercept() takes 'unsigned int bytes'
  and uses it: argument to util/throttle.c throttle_account(uint64_t)

  All safe: it patches a latent bug, and does not introduce any 64-bit
  gotchas once throttle_co_p{read,write}v are relaxed, and assuming
  throttle_account() is not buggy.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20201211183934.169161-7-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:14:00 -06:00
Vladimir Sementsov-Ogievskiy 98ca45494f block/io: bdrv_pad_request(): support qemu_iovec_init_extended failure
Make bdrv_pad_request() honest: return error if
qemu_iovec_init_extended() failed.

Update also bdrv_padding_destroy() to clean the structure for safety.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-6-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:14:00 -06:00
Vladimir Sementsov-Ogievskiy f0deecff82 block/io: refactor bdrv_pad_request(): move bdrv_pad_request() up
Prepare for the following patch when bdrv_pad_request() will be able to
fail. Update the comments.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:00:52 -06:00
Vladimir Sementsov-Ogievskiy a56ed80c42 block: fix theoretical overflow in bdrv_init_padding()
Calculation of sum may theoretically overflow, so use 64bit type and
add some good assertions.

Use int64_t constantly.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: tweak assertion order]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:00:33 -06:00
Vladimir Sementsov-Ogievskiy 4c002cef0e util/iov: make qemu_iovec_init_extended() honest
Actually, we can't extend the io vector in all cases. Handle possible
MAX_IOV and size_t overflows.

For now add assertion to callers (actually they rely on success anyway)
and fix them in the following patch.

Add also some additional good assertions to qemu_iovec_init_slice()
while being here.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:00:33 -06:00
Vladimir Sementsov-Ogievskiy 69b55e03f7 block: refactor bdrv_check_request: add errp
It's better to pass &error_abort than just assert that result is 0: on
crash, we'll immediately see the reason in the backtrace.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: fix iotest 206 fallout]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:00:33 -06:00
Kevin Wolf 26513a0174 block: Fix VM size column width in bdrv_snapshot_dump()
size_to_str() can return a size like "4.24 MiB", with a single digit
integer part and two fractional digits. This is eight characters, but
commit b39847a5 changed the format string to only reserve seven
characters for the column.

This can result in unaligned columns, which in turn changes the output of
iotests case 267 because exceeding the column size defeats the attempt
to filter the size out of the output (observed with the ppc64 emulator).
The resulting change is only a whitespace change, but since commit
f203080b this is enough for iotests to consider the test failed.

Taking a character away from the tag name column and adding it to the VM
size column doesn't change anything in the common case (the tag name is
left justified, the VM size is right justified), but fixes this case.

Fixes: b39847a505
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210202155911.179865-1-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-02-02 17:23:55 +01:00
Philippe Mathieu-Daudé fcc8672aca block/nvme: Trace NVMe spec version supported by the controller
NVMe controllers implement different versions of the spec,
and different features of it. It is useful to gather this
information when debugging.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210127212137.3482291-3-philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-02-02 17:05:38 +01:00
Philippe Mathieu-Daudé 97b709f32e block/nvme: Properly display doorbell stride length in trace event
Commit 15b2260bef ("block/nvme: Trace controller capabilities")
misunderstood the doorbell stride value from the datasheet, use
the correct one. The 'doorbell_scale' variable used few lines
later is correct.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210127212137.3482291-2-philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-02-02 17:05:38 +01:00
Peter Maydell 7e7eb9f852 QAPI patches patches for 2021-01-28
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmASY10SHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZT4M4P+gKN64+WaErotLHHsqtiA0aoTwbTFXin
 OEyR5du0+PjX96qYHHV+ZDn5uxxKI57/SRNooPndjU63sYgAbNApfsu+wUDZC844
 qMSlrmyw2+Lw1EIykoLXK49+pEDU0XpVIciL5+zEdtCgjiJRjrOOJ/JRBcKoQNHn
 UArGNQ8y0D+0i8uXyJjyvQeHdz6KUr9sX1vqwRGMt9axEMDJks0+Si4Zg3z2wlWJ
 Sc3WsXEhikxK1qkF2/6VsopOgNGB0UUvV6q1GO6ngdqag1Hb6mACzSv9mtIShGjh
 a2MISBhxF8h4wfO8U5TiS9vBgYR3elA3kRGsn4FOfD3sSilt/SWLPHWXdlO1aL2E
 TollRPtYBqn2YIYQP1SEp7NIqaWC/QaGkP/mH8Jvv0YlL64RK879lv6KiHKzfvI7
 HBD7WGZBwMQqPczuw308tqDTQPKUsPDYoEJAFRywkLry86wL8DBOlkQ0lWUjF06s
 UQk/i09nhrcNLo0GbmgAOHUVj4m03zLyMW/fYmsQ8xe9/b6GBwJvtm2v5wKwg0HE
 ixxj4oBIk5YV5Xwt7DKLkT0voPAAgNK13a6ywzbyfsigwaJaO9tLtZ0PMuaT9kgs
 b/OBdeeIYpFdIT/DlcMWpIFi53VYe0McX8MmprHcMZb1133wk5Z5gk+FAWLMifrw
 2ltmoUPoB1dC
 =djiE
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2021-01-28' into staging

QAPI patches patches for 2021-01-28

# gpg: Signature made Thu 28 Jan 2021 07:10:21 GMT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-qapi-2021-01-28:
  qapi: More complex uses of QAPI_LIST_APPEND
  qapi: Use QAPI_LIST_APPEND in trivial cases
  qapi: Introduce QAPI_LIST_APPEND
  qapi: A couple more QAPI_LIST_PREPEND() stragglers
  net: Clarify early exit condition

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-01-28 22:43:18 +00:00
Eric Blake 95b3a8c8a8 qapi: More complex uses of QAPI_LIST_APPEND
These cases require a bit more thought to review; in each case, the
code was appending to a list, but not with a FOOList **tail variable.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210113221013.390592-6-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Flawed change to qmp_guest_network_get_interfaces() dropped]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2021-01-28 08:08:45 +01:00
Eric Blake c3033fd372 qapi: Use QAPI_LIST_APPEND in trivial cases
The easiest spots to use QAPI_LIST_APPEND are where we already have an
obvious pointer to the tail of a list.  While at it, consistently use
the variable name 'tail' for that purpose.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210113221013.390592-5-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2021-01-28 08:08:45 +01:00
Kevin Wolf 86b1cf3227 block: Separate blk_is_writable() and blk_supports_write_perm()
Currently, blk_is_read_only() tells whether a given BlockBackend can
only be used in read-only mode because its root node is read-only. Some
callers actually try to answer a slightly different question: Is the
BlockBackend configured to be writable, by taking write permissions on
the root node?

This can differ, for example, for CD-ROM devices which don't take write
permissions, but may be backed by a writable image file. scsi-cd allows
write requests to the drive if blk_is_read_only() returns false.
However, the write request will immediately run into an assertion
failure because the write permission is missing.

This patch introduces separate functions for both questions.
blk_supports_write_perm() answers the question whether the block
node/image file can support writable devices, whereas blk_is_writable()
tells whether the BlockBackend is currently configured to be writable.

All calls of blk_is_read_only() are converted to one of the two new
functions.

Fixes: https://bugs.launchpad.net/bugs/1906693
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210118123448.307825-2-kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-01-27 20:45:20 +01:00
David Edmondson 797e3e3805 block: report errno when flock fcntl fails
When a call to fcntl(2) for the purpose of adding file locks fails
with an error other than EAGAIN or EACCES, report the error returned
by fcntl.

EAGAIN or EACCES are elided as they are considered to be common
failures, indicating that a conflicting lock is held by another
process.

No errors are elided when removing file locks.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20210113164447.2545785-1-david.edmondson@oracle.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy 143a6384f5 block/block-copy: drop unused argument of block_copy()
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-21-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy 5b49c2bdc1 block/block-copy: drop unused block_copy_set_progress_callback()
Drop unused code.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-20-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy 71eed4cebe backup: move to block-copy
This brings async request handling and block-status driven chunk sizes
to backup out of the box, which improves backup performance.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-18-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy 511e7d31bf block/backup: drop extra gotos from backup_run()
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-17-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy d51590fc3e block/block-copy: make progress_bytes_callback optional
We are going to stop use of this callback in the following commit.
Still the callback handling code will be dropped in a separate commit.
So, for now let's make it optional.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-16-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy 2c59fd833a qapi: backup: add max-chunk and max-workers to x-perf struct
Add new parameters to configure future backup features. The patch
doesn't introduce aio backup requests (so we actually have only one
worker) neither requests larger than one cluster. Still, formally we
satisfy these maximums anyway, so add the parameters now, to facilitate
further patch which will really change backup job behavior.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-11-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy a6d23d56df block/block-copy: add block_copy_cancel
Add function to cancel running async block-copy call. It will be used
in backup.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-8-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy 7e032df0ea block/block-copy: add ratelimit to block-copy
We are going to directly use one async block-copy operation for backup
job, so we need rate limiter.

We want to maintain current backup behavior: only background copying is
limited and copy-before-write operations only participate in limit
calculation. Therefore we need one rate limiter for block-copy state
and boolean flag for block-copy call state for actual limitation.

Note, that we can't just calculate each chunk in limiter after
successful copying: it will not save us from starting a lot of async
sub-requests which will exceed limit too much. Instead let's use the
following scheme on sub-request creation:
1. If at the moment limit is not exceeded, create the request and
account it immediately.
2. If at the moment limit is already exceeded, drop create sub-request
and handle limit instead (by sleep).
With this approach we'll never exceed the limit more than by one
sub-request (which pretty much matches current backup behavior).

Note also, that if there is in-flight block-copy async call,
block_copy_kick() should be used after set-speed to apply new setup
faster. For that block_copy_kick() published in this patch.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-7-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy 2e099a9d29 block/block-copy: add list of all call-states
It simplifies debugging.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-6-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy 26be9d62dd block/block-copy: add max_chunk and max_workers parameters
They will be used for backup.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-5-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy de4641b46b block/block-copy: implement block_copy_async
We'll need async block-copy invocation to use in backup directly.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-4-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy 3b8c2329b5 block/block-copy: More explicit call_state
Refactor common path to use BlockCopyCallState pointer as parameter, to
prepare it for use in asynchronous block-copy (at least, we'll need to
run block-copy in a coroutine, passing the whole parameters as one
pointer).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-3-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy 86c6a3b690 qapi: backup: add perf.use-copy-range parameter
Experiments show, that copy_range is not always making things faster.
So, to make experimentation simpler, let's add a parameter. Some more
perf parameters will be added soon, so here is a new struct.

For now, add new backup qmp parameter with x- prefix for the following
reasons:

 - We are going to add more performance parameters, some will be
   related to the whole block-copy process, some only to background
   copying in backup (ignored for copy-before-write operations).
 - On the other hand, we are going to use block-copy interface in other
   block jobs, which will need performance options as well.. And it
   should be the same structure or at least somehow related.

So, there are too much unclean things about how the interface and now
we need the new options mostly for testing. Let's keep them
experimental for a while.

In do_backup_common() new x-perf parameter handled in a way to
make further options addition simpler.

We add use-copy-range with default=true, and we'll change the default
in further patch, after moving backup to use block-copy.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-2-vsementsov@virtuozzo.com>
[mreitz: s/5\.2/6.0/]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Andrey Shinkevich 205736f488 block: apply COR-filter to block-stream jobs
This patch completes the series with the COR-filter applied to
block-stream operations.

Adding the filter makes it possible in future implement discarding
copied regions in backing files during the block-stream job, to reduce
the disk overuse (we need control on permissions).

Also, the filter now is smart enough to do copy-on-read with specified
base, so we have benefit on guest reads even when doing block-stream of
the part of the backing chain.

Several iotests are slightly modified due to filter insertion.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201216061703.70908-14-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy 0f6c94988a block/stream: add s->target_bs
Add a direct link to target bs for convenience and to simplify
following commit which will insert COR filter above target bs.

This is a part of original commit written by Andrey.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-13-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy 7f4a396d76 qapi: block-stream: add "bottom" argument
The code already don't freeze base node and we try to make it prepared
for the situation when base node is changed during the operation. In
other words, block-stream doesn't own base node.

Let's introduce a new interface which should replace the current one,
which will in better relations with the code. Specifying bottom node
instead of base, and requiring it to be non-filter gives us the
following benefits:

 - drop difference between above_base and base_overlay, which will be
   renamed to just bottom, when old interface dropped

 - clean way to work with parallel streams/commits on the same backing
   chain, which otherwise become a problem when we introduce a filter
   for stream job

 - cleaner interface. Nobody will surprised the fact that base node may
   disappear during block-stream, when there is no word about "base" in
   the interface.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201216061703.70908-11-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Andrey Shinkevich 000e5a1cda stream: rework backing-file changing
Stream in stream_prepare calls bdrv_change_backing_file() to change
backing-file in the metadata of bs.

It may use either backing-file parameter given by user or just take
filename of base on job start.

Backing file format is determined by base on job finish.

There are some problems with this design, we solve only two by this
patch:

1. Consider scenario with backing-file unset. Current concept of stream
supports changing of the base during the job (we don't freeze link to
the base). So, we should not save base filename at job start,

  - let's determine name of the base on job finish.

2. Using direct base to determine filename and format is not very good:
base node may be a filter, so its filename may be JSON, and format_name
is not good for storing into qcow2 metadata as backing file format.

  - let's use unfiltered_base

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
  [vsementsov: change commit subject, change logic in stream_prepare]
Message-Id: <20201216061703.70908-10-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Andrey Shinkevich e275458b29 copy-on-read: skip non-guest reads if no copy needed
If the flag BDRV_REQ_PREFETCH was set, skip idling read/write
operations in COR-driver. It can be taken into account for the
COR-algorithms optimization. That check is being made during the
block stream job by the moment.

Add the BDRV_REQ_PREFETCH flag to the supported_read_flags of the
COR-filter.

block: Modify the comment for the flag BDRV_REQ_PREFETCH as we are
going to use it alone and pass it to the COR-filter driver for further
processing.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-9-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Andrey Shinkevich 897dd0ec4f block: include supported_read_flags into BDS structure
Add the new member supported_read_flags to the BlockDriverState
structure. It will control the flags set for copy-on-read operations.
Make the block generic layer evaluate supported read flags before they
go to a block driver.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
 [vsementsov: use assert instead of abort]
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-8-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Andrey Shinkevich e4c8fddde7 qapi: copy-on-read filter: add 'bottom' option
Add an option to limit copy-on-read operations to specified sub-chain
of backing-chain, to make copy-on-read filter useful for block-stream
job.

Suggested-by: Max Reitz <mreitz@redhat.com>
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
  [vsementsov: change subject, modified to freeze the chain,
   do some fixes]
Message-Id: <20201216061703.70908-6-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 11:26:54 +01:00
Andrey Shinkevich 880747a887 qapi: add filter-node-name to block-stream
Provide the possibility to pass the 'filter-node-name' parameter to the
block-stream job as it is done for the commit block job.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
  [vsementsov: comment indentation, s/Since: 5.2/Since: 6.0/]
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-5-vsementsov@virtuozzo.com>
[mreitz: s/commit/stream/]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 11:26:54 +01:00
Andrey Shinkevich 16e09a21af copy-on-read: add filter drop function
Provide API for the COR-filter removal. Also, drop the filter child
permissions for an inactive state when the filter node is being
removed.
To insert the filter, the block generic layer function
bdrv_insert_node() can be used.
The new function bdrv_cor_filter_drop() may be considered as an
intermediate solution before the QEMU permission update system has
overhauled. Then we are able to implement the API function
bdrv_remove_node() on the block generic layer.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-4-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 11:26:54 +01:00
Andrey Shinkevich 1252e03b8e copy-on-read: support preadv/pwritev_part functions
Add support for the recently introduced functions
bdrv_co_preadv_part()
and
bdrv_co_pwritev_part()
to the COR-filter driver.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201216061703.70908-2-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 11:26:54 +01:00
Peter Maydell 45240eed4f Yank patches patches for 2021-01-13
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl/+vJoSHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZTyv8P/3Dqb9sM8p2WIeoUt5KjgkgWto8anCXM
 /vzAvVdOrPcLXgHF1HOEkcYkp5ZDzFb1PP+LNWsbIB82HmGUGfA7CiOpCpXEoDJs
 Z9OYR3K8W5fvSKYTI/m+s7d+9aYqRZajI6ON5M4Eqem0ZwV93/SZMBHOcs1GmvIR
 diXIztaWVgHjU1Q37MlvJTM4lLN3RH1kTWEdfp3dkNMO6HxBet0B1g7xwaPxKrgb
 4y8D/kk9TA1m4wnwrr9s1l3UnfDiZ7mSfsEsXMcTMmQQrAtonD/xiX2YFsHwf6+U
 9cX9BCG2XP65t3ynY5goddcjVX3R6SuP4YWSgYUpJGrSx+GVxXrtF0ZumgiCk+4T
 uv8sOkJdPe9/aRy86UkNzJx7V50nCZnJ2if9neukVjHGW4Hw4txcYV5/7ZuuDqKl
 tF5NdF/LcHEZLKkBt+4g4TpJ7vxEmJc8/ukn9niLT381SkRBFnnP+Bnl9taSlHOY
 xNtQJ3Jrcd/cvBjAPCKgtE4Fx6wzQG3c7Yg4WHxbZzcYZBPp4fifUTLCK5XKHqhb
 rlqCQIO9DzGz5tqOgG7hWmlodSafGiDsHo9tJVpyF5pSSUv3A4KzX2xW4FZZLKJn
 7uBrcV0bLmR4tyw+fr+u2EW0ClYrs/JxeXAnsnTp9JrzkXILf5RjEuK0Sc1ZIuZW
 cmuPa8027ybj
 =fLO1
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-yank-2021-01-13' into staging

Yank patches patches for 2021-01-13

# gpg: Signature made Wed 13 Jan 2021 09:25:46 GMT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-yank-2021-01-13:
  tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test
  io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown
  io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
  migration: Add yank feature
  chardev/char-socket.c: Add yank feature
  block/nbd.c: Add yank feature
  Introduce yank feature

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-01-13 14:19:24 +00:00
Lukas Straub fee091cdff block/nbd.c: Add yank feature
Register a yank function which shuts down the socket and sets
s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an
error occured.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <b73eb07db6d1fcd00667beb13ae6117260f002c3.1609167865.git.lukasstraub2@web.de>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2021-01-13 10:21:17 +01:00
Roman Bolshakov 3eacf70bb5 meson: Propagate gnutls dependency
crypto/tlscreds.h includes GnuTLS headers if CONFIG_GNUTLS is set, but
GNUTLS_CFLAGS, that describe include path, are not propagated
transitively to all users of crypto and build fails if GnuTLS headers
reside in non-standard directory (which is a case for homebrew on Apple
Silicon).

Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Message-Id: <20210102125213.41279-1-r.bolshakov@yadro.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-12 12:38:03 +01:00
Peter Maydell 729cc68373 Remove superfluous timer_del() calls
This commit is the result of running the timer-del-timer-free.cocci
script on the whole source tree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201215154107.3255-4-peter.maydell@linaro.org
2021-01-08 15:13:38 +00:00
Paolo Bonzini 9db405a335 libiscsi: convert to meson
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-02 21:03:37 +01:00
Paolo Bonzini 8e4e2b551d curl: remove compatibility code, require 7.29.0
cURL 7.16.0 was released in October 2006.  Just remove code that is
in all likelihood not being used anywhere, and require the oldest version
found in currently supported distros, which is 7.29.0 from CentOS 7.

pkg-config is enough for QEMU, since it does not need extra information
such as the path for certicate authorities.  All supported platforms
today will all have pkg-config for curl, so we can drop curl-config.

Suggested-by: Daniel Berrangé <berrange@redhat.com>
Reviewed-by: Daniel Berrangé <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-02 21:03:37 +01:00
Paolo Bonzini 2f2a376a42 meson: use dependency to gate block modules
This allows converting the dependencies to meson options one by one.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-02 21:03:36 +01:00
Peter Maydell 1f7c02797f QAPI patches patches for 2020-12-19
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl/dynUSHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZT3igP/3bWwsKR5vKVsDUTmMfrhcgaFvQiaYoG
 F29Bond8Xy0Zd0gl7OWh/5jKL0vGlrEVPrKfYLUjMnfkeRec/pOkIB2oOmIxpnPs
 9zi4kh2hQ3dEoRBuvSnnZzedetYPTuCpWMIjlztkgfxgcimqm8TPNVSxRaSApjC3
 Y8108wGwBWVf2C0rhKO9E2xA51uo6khy05i1psUtqUlC+PuDQ/OwzQHM2dnWdDB6
 kUwBDK17nhL6WwsYqCyKLSiDModReYfDiY8GS5MDLo74dzwXiatEefCR7+sbM4xq
 eX/SBoqoeS1jLPNuCryNeGNKvNA2KAbEJTnbQA2NxBXHgZ9/1SxVZFxuPp4nDMSQ
 N7BDuDI8YtJE479RjT/ZzRG65xadGBSe/HXkXM9mZwh1zitop8SVZ9fArFBHvNzw
 Y5zAv3fQd54+87psffg4dYFK0wGmqTabLEEuVzM8KIVqcAdYA2yC2b2EHy+vsxuq
 GMkr0WaA6Sq2gthXmzdTjmUPuHdan/NIhuV6d66SbPNH2oH31piptFxuznyFWSKV
 isciFFdUrkg5QrF8DSt2nmdwMFf8QGbszqP8QIGMzhJCCS9GXIiGG8f149++q8X8
 HO1lFAdLQJdrDwCYmfx36tOvi2rS/rcoTGgvg66UX3xKko1ruoxR1ZWcS54obJN6
 vEQDZ+PxubDg
 =vGLy
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2020-12-19' into staging

QAPI patches patches for 2020-12-19

# gpg: Signature made Sat 19 Dec 2020 09:40:05 GMT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-qapi-2020-12-19: (33 commits)
  qobject: Make QString immutable
  block: Use GString instead of QString to build filenames
  keyval: Use GString to accumulate value strings
  json: Use GString instead of QString to accumulate strings
  migration: Replace migration's JSON writer by the general one
  qobject: Factor JSON writer out of qobject_to_json()
  qobject: Factor quoted_str() out of to_json()
  qobject: Drop qstring_get_try_str()
  qobject: Drop qobject_get_try_str()
  Revert "qobject: let object_property_get_str() use new API"
  block: Avoid qobject_get_try_str()
  qmp: Fix tracing of non-string command IDs
  qobject: Move internals to qobject-internal.h
  hw/rdma: Replace QList by GQueue
  Revert "qstring: add qstring_free()"
  qobject: Change qobject_to_json()'s value to GString
  qobject: Use GString instead of QString to accumulate JSON
  qobject: Make qobject_to_json_pretty() take a pretty argument
  monitor: Use GString instead of QString for output buffer
  hmp: Simplify how qmp_human_monitor_command() gets output
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-01-01 14:33:03 +00:00
Peter Maydell 26f6b15e26 Block patches:
- New block filter: preallocate (which, on writes beyond an image file's
   end, allocates big chunks of data so that such post-EOF writes will
   occur less frequently)
 - write-zeroes and block-status support for Quorum
 - Implementation of truncate for the nvme block driver similarly to the
   existing implementations for host block devices and iscsi devices
 - Block layer refactoring: Drop the tighten_restrictions concept in the
   block permission functions
 - iotest fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAl/cwIoSHG1yZWl0ekBy
 ZWRoYXQuY29tAAoJEPQH2wBh1c9AnvMH+gOnZCwEUKWuBxGX3Wjb/kqV1OuhAhcP
 IVrKLRnqdarCYMQ9M4SZL6pedfsujHA7vClTV7NTrenXBsEIradBQ59ztQ0oDirS
 4ipIjVtNqj7m86l+IRZDq5HlwOYwwFnWogmLo2bcmNJGLpPQQfrhL2vRJ1wLgFYk
 WjeAVlkkYcHnTIDvs4ne9WRSlxGVBWJ4X5nSlRdZqeyUcMY9v4wL4P9Wc4ZuORmq
 /5HRcT5JKGaT2bAueaqAGEdtPFGbazEP5uU7MTTK/fueDKIRAXO2d0gqhANtOOJQ
 7hMmKhwOPOOhrrpCVi9nxsVwdCOHfurV0km6cOs+Iprm/Wm2UtuS/A8=
 =z+7k
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2020-12-18' into staging

Block patches:
- New block filter: preallocate (which, on writes beyond an image file's
  end, allocates big chunks of data so that such post-EOF writes will
  occur less frequently)
- write-zeroes and block-status support for Quorum
- Implementation of truncate for the nvme block driver similarly to the
  existing implementations for host block devices and iscsi devices
- Block layer refactoring: Drop the tighten_restrictions concept in the
  block permission functions
- iotest fixes

# gpg: Signature made Fri 18 Dec 2020 14:45:30 GMT
# gpg:                using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40
# gpg:                issuer "mreitz@redhat.com"
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full]
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* remotes/maxreitz/tags/pull-block-2020-12-18: (30 commits)
  iotests: Fix _send_qemu_cmd with bash 5.1
  iotests/102: Pass $QEMU_HANDLE to _send_qemu_cmd
  block/nvme: Implement fake truncate() coroutine
  quorum: Implement bdrv_co_pwrite_zeroes()
  quorum: Implement bdrv_co_block_status()
  scripts/simplebench: add bench_prealloc.py
  simplebench/results_to_text: make executable
  simplebench/results_to_text: add difference line to the table
  simplebench/results_to_text: improve view of the table
  simplebench: move results_to_text() into separate file
  simplebench: rename ascii() to results_to_text()
  scripts/simplebench: use standard deviation for +- error
  scripts/simplebench: support iops
  scripts/simplebench: fix grammar: s/successed/succeeded/
  iotests: add 298 to test new preallocate filter driver
  iotests.py: execute_setup_common(): add required_fmts argument
  iotests: qemu_io_silent: support --image-opts
  qemu-io: add preallocate mode parameter for truncate command
  block: introduce preallocate filter
  block: bdrv_check_perm(): process children anyway
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-12-31 23:26:46 +00:00
Markus Armbruster eab3a4678b qobject: Change qobject_to_json()'s value to GString
qobject_to_json() and qobject_to_json_pretty() build a GString, then
covert it to QString.  Just one of the callers actually needs a
QString: qemu_rbd_parse_filename().  A few others need a string they
can modify: qmp_send_response(), qga's send_response(), to_json_str(),
and qmp_fd_vsend_fds().  The remainder just need a string.

Change qobject_to_json() and qobject_to_json_pretty() to return the
GString.

qemu_rbd_parse_filename() now has to convert to QString.  All others
save a QString temporary.  to_json_str() actually becomes a bit
simpler, because GString provides more convenient modification
functions.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201211171152.146877-6-armbru@redhat.com>
2020-12-19 10:38:43 +01:00
Eric Blake 54aa3de72e qapi: Use QAPI_LIST_PREPEND() where possible
Anywhere we create a list of just one item or by prepending items
(typically because order doesn't matter), we can use
QAPI_LIST_PREPEND().  But places where we must keep the list in order
by appending remain open-coded until later patches.

Note that as a side effect, this also performs a cleanup of two minor
issues in qga/commands-posix.c: the old code was performing
 new = g_malloc0(sizeof(*ret));
which 1) is confusing because you have to verify whether 'new' and
'ret' are variables with the same type, and 2) would conflict with C++
compilation (not an actual problem for this file, but makes
copy-and-paste harder).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201113011340.463563-5-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
[Straightforward conflicts due to commit a8aa94b5f8 "qga: update
schema for guest-get-disks 'dependents' field" and commit a10b453a52
"target/mips: Move mips_cpu_add_definition() from helper.c to cpu.c"
resolved.  Commit message tweaked.]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2020-12-19 10:20:14 +01:00
Markus Armbruster be7c5ddd0d block/vpc: Use sizeof() instead of HEADER_SIZE for footer size
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-10-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:30 +01:00
Markus Armbruster a3d2761719 block/vpc: Pass footer buffers as VHDFooter * instead of uint8_t *
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-9-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:28 +01:00
Markus Armbruster 275734e479 block/vpc: Pad VHDFooter, replace uint8_t[] buffers
Pad VHDFooter as specified in the "Virtual Hard Disk Image Format
Specification" version 1.0[*].  Change footer buffers from
uint8_t[HEADER_SIZE] to VHDFooter.  Their size remains the same.

The VHDFooter * variables pointing to a VHDFooter variable right next
to it are now silly.  Eliminate them, and shorten the remaining
variables' names.

Most variables pointing to s->footer are now also silly.  Eliminate
them, too.

[*] http://download.microsoft.com/download/f/f/e/ffef50a5-07dd-4cf8-aaa3-442c0673a029/Virtual%20Hard%20Disk%20Format%20Spec_10_18_06.doc

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-8-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:26 +01:00
Markus Armbruster 3d6101a3f2 block/vpc: Use sizeof() instead of 1024 for dynamic header size
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-7-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:23 +01:00
Markus Armbruster e326f0783e block/vpc: Pad VHDDynDiskHeader, replace uint8_t[] buffers
Pad VHDDynDiskHeader as specified in the "Virtual Hard Disk Image
Format Specification" version 1.0[*].  Change dynamic disk header
buffers from uint8_t[1024] to VHDDynDiskHeader.  Their size remains
the same.

The VHDDynDiskHeader * variables pointing to a VHDDynDiskHeader
variable right next to it are now silly.  Eliminate them.

[*] http://download.microsoft.com/download/f/f/e/ffef50a5-07dd-4cf8-aaa3-442c0673a029/Virtual%20Hard%20Disk%20Format%20Spec_10_18_06.doc

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-6-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:18 +01:00
Markus Armbruster 7550379ded block/vpc: Make vpc_checksum() take void *
Some of the next commits will checksum structs.  Change vpc_checksum()
to take void * instead of uint8_t, to save us pointless casts to
uint8_t *.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-5-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:16 +01:00
Markus Armbruster a18dc3a14d block/vpc: Don't abuse the footer buffer for dynamic header
create_dynamic_disk() takes a buffer holding the footer as first
argument.  It writes out the footer (512 bytes), then reuses the
buffer to initialize and write out the dynamic header (1024 bytes).

Works, because the caller passes a buffer that is large enough for
both purposes.  I hate that.

Use a separate buffer for the dynamic header, and adjust the caller's
buffer.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-4-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:14 +01:00
Markus Armbruster b0ce8cb0e8 block/vpc: Don't abuse the footer buffer as BAT sector buffer
create_dynamic_disk() takes a buffer holding the footer as first
argument.  It writes out the footer (512 bytes), then reuses the
buffer to initialize and write out the dynamic header (1024 bytes),
then reuses it again to initialize and write out BAT sectors (512).

Works, because the caller passes a buffer that is large enough for all
three purposes.  I hate that.

Use a separate buffer for writing out BAT sectors.  The next commit
will do the same for the dynamic header.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-3-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:06 +01:00
Markus Armbruster 02df95c4a1 block/vpc: Make vpc_open() read the full dynamic header
The dynamic header's size is 1024 bytes.

vpc_open() reads only the 512 bytes of the dynamic header into buf[].
Works, because it doesn't actually access the second half.  However, a
colleague told me that GCC 11 warns:

    ../block/vpc.c:358:51: error: array subscript 'struct VHDDynDiskHeader[0]' is partly outside array bounds of 'uint8_t[512]' [-Werror=array-bounds]

Clean up to read the full header.

Rename buf[] to dyndisk_header_buf[] while there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-2-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:42:34 +01:00
Philippe Mathieu-Daudé c8807c5edc block/nvme: Implement fake truncate() coroutine
NVMe drive cannot be shrunk.

Since commit c80d8b06cf we can use the @exact parameter (set
to false) to return success if the block device is larger than
the requested offset (even if we can not be shrunk).

Use this parameter to implement the NVMe truncate() coroutine,
similarly how it is done for the iscsi and file-posix drivers
(see commit 82325ae5f2 "Evaluate @exact in protocol drivers").

Reported-by: Xueqiang Wei <xuwei@redhat.com>
Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20201210125202.858656-1-philmd@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Alberto Garcia 5cddb2e95f quorum: Implement bdrv_co_pwrite_zeroes()
This simply calls bdrv_co_pwrite_zeroes() in all children.

bs->supported_zero_flags is also set to the flags that are supported
by all children.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <2f09c842781fe336b4c2e40036bba577b7430190.1605286097.git.berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Alberto Garcia ef9bba1484 quorum: Implement bdrv_co_block_status()
The quorum driver does not implement bdrv_co_block_status() and
because of that it always reports to contain data even if all its
children are known to be empty.

One consequence of this is that if we for example create a quorum with
a size of 10GB and we mirror it to a new image the operation will
write 10GB of actual zeroes to the destination image wasting a lot of
time and disk space.

Since a quorum has an arbitrary number of children of potentially
different formats there is no way to report all possible allocation
status flags in a way that makes sense, so this implementation only
reports when a given region is known to contain zeroes
(BDRV_BLOCK_ZERO) or not (BDRV_BLOCK_DATA).

If all children agree that a region contains zeroes then we can return
BDRV_BLOCK_ZERO using the smallest size reported by the children
(because all agree that a region of at least that size contains
zeroes).

If at least one child disagrees we have to return BDRV_BLOCK_DATA.
In this case we use the largest of the sizes reported by the children
that didn't return BDRV_BLOCK_ZERO (because we know that there won't
be an agreement for at least that size).

Signed-off-by: Alberto Garcia <berto@igalia.com>
Tested-by: Tao Xu <tao3.xu@intel.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <db83149afcf0f793effc8878089d29af4c46ffe1.1605286097.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy 33fa2222eb block: introduce preallocate filter
It's intended to be inserted between format and protocol nodes to
preallocate additional space (expanding protocol file) on writes
crossing EOF. It improves performance for file-systems with slow
allocation.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201021145859.11201-9-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Two comment fixes, and bumped the version from 5.2 to 6.0]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy d1a764d126 block: introduce BDRV_REQ_NO_WAIT flag
Add flag to make serialising request no wait: if there are conflicting
requests, just return error immediately. It's will be used in upcoming
preallocate filter.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201021145859.11201-7-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy 8ac5aab255 block: bdrv_mark_request_serialising: split non-waiting function
We'll need a separate function, which will only "mark" request
serialising with specified align but not wait for conflicting
requests. So, it will be like old bdrv_mark_request_serialising(),
before merging bdrv_wait_serialising_requests_locked() into it.

To reduce the possible mess, let's do the following:

Public function that does both marking and waiting will be called
bdrv_make_request_serialising, and private function which will only
"mark" will be called tracked_request_set_serialising().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201021145859.11201-6-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy ec1c886831 block/io: bdrv_wait_serialising_requests_locked: drop extra bs arg
bs is linked in req, so no needs to pass it separately. Most of
tracked-requests API doesn't have bs argument. Actually, after this
patch only tracked_request_begin has it, but it's for purpose.

While being here, also add a comment about what "_locked" is.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201021145859.11201-5-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy 3183937ff9 block/io: split out bdrv_find_conflicting_request
To be reused in separate.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201021145859.11201-4-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy 2e36da62cf block/io.c: drop assertion on double waiting for request serialisation
The comments states, that on misaligned request we should have already
been waiting. But for bdrv_padding_rmw_read, we called
bdrv_mark_request_serialising with align = request_alignment, and now
we serialise with align = cluster_size. So we may have to wait again
with larger alignment.

Note, that the only user of BDRV_REQ_SERIALISING is backup which issues
cluster-aligned requests, so seems the assertion should not fire for
now. But it's wrong anyway.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20201021145859.11201-3-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Peter Lieven 182454dc63 block/nfs: fix int overflow in nfs_client_open_qdict
nfs_client_open returns the file size in sectors. This effectively
makes it impossible to open files larger than 1TB.

Fixes: c22a034545
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <20201209121735.16437-1-pl@kamp.de>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 11:48:39 +01:00
Pan Nengyuan cb8d0851f1 block/file-posix: fix a possible undefined behavior
local_err is not initialized to NULL, it will cause a assert error as below:
qemu/util/error.c:59: error_setv: Assertion `*errp == NULL' failed.

Fixes: c644751069
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Message-Id: <20201023061218.2080844-8-kuhn.chenqun@huawei.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-12-13 23:56:16 +01:00
Kevin Wolf 960d5fb3e8 block: Fix deadlock in bdrv_co_yield_to_drain()
If bdrv_co_yield_to_drain() is called for draining a block node that
runs in a different AioContext, it keeps that AioContext locked while it
yields and schedules a BH in the AioContext to do the actual drain.

As long as executing the BH is the very next thing that the event loop
of the node's AioContext does, this actually happens to work, but when
it tries to execute something else that wants to take the AioContext
lock, it will deadlock. (In the bug report, this other thing is a
virtio-scsi device running virtio_scsi_data_plane_handle_cmd().)

Instead, always drop the AioContext lock across the yield and reacquire
it only when the coroutine is reentered. The BH needs to unconditionally
take the lock for itself now.

This fixes the 'block_resize' QMP command on a block node that runs in
an iothread.

Cc: qemu-stable@nongnu.org
Fixes: eb94b81a94
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1903511
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201203172311.68232-4-kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy 8b1170012b block: introduce BDRV_MAX_LENGTH
We are going to modify block layer to work with 64bit requests. And
first step is moving to int64_t type for both offset and bytes
arguments in all block request related functions.

It's mostly safe (when widening signed or unsigned int to int64_t), but
switching from uint64_t is questionable.

So, let's first establish the set of requests we want to work with.
First signed int64_t should be enough, as off_t is signed anyway. Then,
obviously offset + bytes should not overflow.

And most interesting: (offset + bytes) being aligned up should not
overflow as well. Aligned to what alignment? First thing that comes in
mind is bs->bl.request_alignment, as we align up request to this
alignment. But there is another thing: look at
bdrv_mark_request_serialising(). It aligns request up to some given
alignment. And this parameter may be bdrv_get_cluster_size(), which is
often a lot greater than bs->bl.request_alignment.
Note also, that bdrv_mark_request_serialising() uses signed int64_t for
calculations. So, actually, we already depend on some restrictions.

Happily, bdrv_get_cluster_size() returns int and
bs->bl.request_alignment has 32bit unsigned type, but defined to be a
power of 2 less than INT_MAX. So, we may establish, that INT_MAX is
absolute maximum for any kind of alignment that may occur with the
request.

Note, that bdrv_get_cluster_size() is not documented to return power
of 2, still bdrv_mark_request_serialising() behaves like it is.
Also, backup uses bdi.cluster_size and is not prepared to it not being
power of 2.
So, let's establish that Qemu supports only power-of-2 clusters and
alignments.

So, alignment can't be greater than 2^30.

Finally to be safe with calculations, to not calculate different
maximums for different nodes (depending on cluster size and
request_alignment), let's simply set QEMU_ALIGN_DOWN(INT64_MAX, 2^30)
as absolute maximum bytes length for Qemu. Actually, it's not much less
than INT64_MAX.

OK, then, let's apply it to block/io.

Let's consider all block/io entry points of offset/bytes:

4 bytes/offset interface functions: bdrv_co_preadv_part(),
bdrv_co_pwritev_part(), bdrv_co_copy_range_internal() and
bdrv_co_pdiscard() and we check them all with bdrv_check_request().

We also have one entry point with only offset: bdrv_co_truncate().
Check the offset.

And one public structure: BdrvTrackedRequest. Happily, it has only
three external users:

 file-posix.c: adopted by this patch
 write-threshold.c: only read fields
 test-write-threshold.c: sets obviously small constant values

Better is to make the structure private and add corresponding
interfaces.. Still it's not obvious what kind of interface is needed
for file-posix.c. Let's keep it public but add corresponding
assertions.

After this patch we'll convert functions in block/io.c to int64_t bytes
and offset parameters. We can assume that offset/bytes pair always
satisfy new restrictions, and make
corresponding assertions where needed. If we reach some offset/bytes
point in block/io.c missing bdrv_check_request() it is considered a
bug. As well, if block/io.c modifies a offset/bytes request, expanding
it more then aligning up to request_alignment, it's a bug too.

For all io requests except for discard we keep for now old restriction
of 32bit request length.

iotest 206 output error message changed, as now test disk size is
larger than new limit. Add one more test case with new maximum disk
size to cover too-big-L1 case.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-5-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy f4dad307ef block/io: bdrv_check_byte_request(): drop bdrv_is_inserted()
Move bdrv_is_inserted() calls into callers.

We are going to make bdrv_check_byte_request() a clean thing.
bdrv_is_inserted() is not about checking the request, it's about
checking the bs. So, it should be separate.

With this patch we probably change error path for some failure
scenarios. But depending on the fact that querying too big request on
empty cdrom (or corrupted qcow2 node with no drv) will result in EIO
and not ENOMEDIUM would be very strange. More over, we are going to
move to 64bit requests, so larger requests will be allowed anyway.

More over, keeping in mind that cdrom is the only driver that has
.bdrv_is_inserted() handler it's strange that we should care so much
about it in generic block layer, intuitively we should just do read and
write, and cdrom driver should return correct errors if it is not
inserted. But it's a work for another series.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-4-vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy 33985614bd block/io: bdrv_refresh_limits(): use ERRP_GUARD
This simplifies following commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-3-vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy 9b100af30f block/file-posix: fix workaround in raw_do_pwrite_zeroes()
We should not set overlap_bytes:

1. Don't worry: it is calculated by bdrv_mark_request_serialising() and
   will be equal to or greater than bytes anyway.

2. If the request was already aligned up to some greater alignment,
   than we may break things: we reduce overlap_bytes, and further
   bdrv_mark_request_serialising() may not help, as it will not restore
   old bigger alignment.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-2-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Li Feng eb43ea16dc file-posix: check the use_lock before setting the file lock
The scenario is that when accessing a volume on an NFS filesystem
without supporting the file lock,  Qemu will complain "Failed to lock
byte 100", even when setting the file.locking = off.

We should do file lock related operations only when the file.locking is
enabled, otherwise, the syscall of 'fcntl' will return non-zero.

Signed-off-by: Li Feng <fengli@smartx.com>
Message-Id: <1607341446-85506-1-git-send-email-fengli@smartx.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Max Reitz df4ea7091b fuse: Implement hole detection through lseek
This is a relatively new feature in libfuse (available since 3.8.0,
which was released in November 2019), so we have to add a dedicated
check whether it is available before making use of it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-7-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Max Reitz 4ca37a96a7 fuse: (Partially) implement fallocate()
This allows allocating areas after the (old) EOF as part of a growing
resize, writing zeroes, and discarding.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-6-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Max Reitz 4fba06d594 fuse: Allow growable exports
These will behave more like normal files in that writes beyond the EOF
will automatically grow the export size.

As an optimization, keep the RESIZE permission for growable exports so
we do not have to take it for every post-EOF write.  (This permission is
not released when the export is destroyed, because at that point the
BlockBackend is destroyed altogether anyway.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-5-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Max Reitz 41429e3d79 fuse: Implement standard FUSE operations
This makes the export actually useful instead of only producing errors
whenever it is accessed.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-4-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Max Reitz 0c9b70d590 fuse: Allow exporting BDSs via FUSE
block-export-add type=fuse allows mounting block graph nodes via FUSE on
some existing regular file.  That file should then appears like a raw
disk image, and accesses to it result in accesses to the exported BDS.

Right now, we only implement the necessary block export functions to set
it up and shut it down.  We do not implement any access functions, so
accessing the mount point only results in errors.  This will be
addressed by a followup patch.

We keep a hash table of exported mount points, because we want to be
able to detect when users try to use a mount point twice.  This is
because we invoke stat() to check whether the given mount point is a
regular file, but if that file is served by ourselves (because it is
already used as a mount point), then this stat() would have to be served
by ourselves, too, which is impossible to do while we (as the caller)
are waiting for it to settle.  Therefore, keep track of mount point
paths to at least catch the most obvious instances of that problem.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-3-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Gan Qixin c208b0ef96 block/iscsi: Use lock guard macros
Replace manual lock()/unlock() calls with lock guard macros
(QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/iscsi.

Signed-off-by: Gan Qixin <ganqixin@huawei.com>
Message-Id: <20201203075055.127773-5-ganqixin@huawei.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Gan Qixin 3af613ebdb block/throttle-groups: Use lock guard macros
Replace manual lock()/unlock() calls with lock guard macros
(QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/throttle-groups.

Signed-off-by: Gan Qixin <ganqixin@huawei.com>
Message-Id: <20201203075055.127773-4-ganqixin@huawei.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Gan Qixin f5056b70e6 block/curl: Use lock guard macros
Replace manual lock()/unlock() calls with lock guard macros
(QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/curl.

Signed-off-by: Gan Qixin <ganqixin@huawei.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20201203075055.127773-3-ganqixin@huawei.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Gan Qixin c37c973660 block/accounting: Use lock guard macros
Replace manual lock()/unlock() calls with lock guard macros
(QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/accounting.

Signed-off-by: Gan Qixin <ganqixin@huawei.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20201203075055.127773-2-ganqixin@huawei.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Markus Armbruster 6cc0667d9b Tweak a few "Parameter 'NAME' expects THING" error message
Change to "expects a THING" where that's an obvious improvement

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201113082626.2725812-11-armbru@redhat.com>
2020-12-10 17:16:44 +01:00
Stefan Hajnoczi 552c2c4c10 block/export: avoid g_return_val_if() input validation
Do not validate input with g_return_val_if(). This API is intended for
checking programming errors and is compiled out with -DG_DISABLE_CHECKS.

Use an explicit if statement for input validation so it cannot
accidentally be compiled out.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201118091644.199527-5-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-09 13:04:17 -05:00
Marc-André Lureau 0df750e9d3 libvhost-user: make it a meson subproject
By making libvhost-user a subproject, check it builds
standalone (without the global QEMU cflags etc).

Note that the library still relies on QEMU include/qemu/atomic.h and
linux_headers/.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20201125100640.366523-6-marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-08 13:48:58 -05:00
Maxim Levitsky c8bf9a9169 qcow2: Fix corruption on write_zeroes with MAY_UNMAP
Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
introduced a subtle change to code in zero_in_l2_slice:

It swapped the order of

1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);

To

1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);

It seems harmless, however the call to qcow2_free_any_clusters can
trigger a cache flush which can mark the L2 table as clean, and
assuming that this was the last write to it, a stale version of it
will remain on the disk.

Now we have a valid L2 entry pointing to a freed cluster. Oops.

Fixes: 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
[ kwolf: Fixed to restore the correct original order from before
  205fa50750; added comments like in discard_in_l2_slice(). ]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201124092815.39056-1-kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-11-24 11:29:41 +01:00
Peter Maydell 683685e72d Pull request for 5.2
NVMe fixes to solve IOMMU issues on non-x86 and error message/tracing
 improvements. Elena Afanasova's ioeventfd fixes are also included.
 
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAl+ixjgACgkQnKSrs4Gr
 c8iZYgf+OB2eAGsdZO97fKh6VUUoRKa+BgWKuh37Cfpp3q+dLuIFMSKfU/UgprLc
 aowt6uTFfwudDV9KltUB2EiXIzpuf7JhMNOiDRkyEvYSj4KHRPsQmFCd35Nrjezy
 VvxSGafe2Z60Qnvcx+iGeMATSFX9YTcTZeHttC07v7dWn/yEK3b1hobcmjCcwWeR
 Ud8pjMyh5E2z/NpW8E669/byJf9iahx3LSQxSWt+9PVTPuftAB0Suu+m6svz1wvk
 sjVfIbtVWCp2BdGf5U6a2rEqF3+kIcFkfHp+MwgE0EdMz1wfjudaPl13a0C4DSun
 PSt9E+Ct5BTrDUvqCHvQDOaFiMZTPg==
 =Poyb
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha-gitlab/tags/block-pull-request' into staging

Pull request for 5.2

NVMe fixes to solve IOMMU issues on non-x86 and error message/tracing
improvements. Elena Afanasova's ioeventfd fixes are also included.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

# gpg: Signature made Wed 04 Nov 2020 15:18:16 GMT
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha-gitlab/tags/block-pull-request: (33 commits)
  util/vfio-helpers: Assert offset is aligned to page size
  util/vfio-helpers: Convert vfio_dump_mapping to trace events
  util/vfio-helpers: Improve DMA trace events
  util/vfio-helpers: Trace where BARs are mapped
  util/vfio-helpers: Trace PCI BAR region info
  util/vfio-helpers: Trace PCI I/O config accesses
  util/vfio-helpers: Improve reporting unsupported IOMMU type
  block/nvme: Fix nvme_submit_command() on big-endian host
  block/nvme: Fix use of write-only doorbells page on Aarch64 arch
  block/nvme: Align iov's va and size on host page size
  block/nvme: Change size and alignment of prp_list_pages
  block/nvme: Change size and alignment of queue
  block/nvme: Change size and alignment of IDENTIFY response buffer
  block/nvme: Correct minimum device page size
  block/nvme: Set request_alignment at initialization
  block/nvme: Simplify nvme_cmd_sync()
  block/nvme: Simplify ADMIN queue access
  block/nvme: Correctly initialize Admin Queue Attributes
  block/nvme: Use definitions instead of magic values in add_io_queue()
  block/nvme: Introduce Completion Queue definitions
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-11-23 13:03:13 +00:00
Peter Maydell c8e5c4b246 Patches for 5.2.0-rc2:
- quorum: Fix crash with rewrite-corrupted and without read-write user
 - io_uring: do not use pointer after free
 - file-posix: Use fallback path for -EBUSY from FALLOC_FL_PUNCH_HOLE
 - iotests: Fix failure on Python 3.9 due to use of a deprecated function
 - char-stdio: Fix QMP default for 'signal'
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl+zt1URHGt3b2xmQHJl
 ZGhhdC5jb20ACgkQfwmycsiPL9ZxyxAAu8GIOeAb7atQvc+KpeBTUG4A+tfAXkC+
 iUYdIpFeWWgmGf7myu3nlaAkeTDk6qHalmzkGRHi3yhX4eNIh5Sdff1YwPcZwf+q
 GLIqFFTW0z1Bd36N8G7Mkf04nKX4QTHqp6THHtSt9jNs56h5OP3axPXVA/3v9y8B
 4ZAkOOvwnwO+U94crhy5y5pX/Vwafv/Dz4DH9hEupE+EI9AuzjZLBrS+sgkxjhmu
 gvHpDSqm6NXwWQA5a24J6NzCy3n/Fw/rqmnoOrN8eRz+4DSCMVDnTDDEMFLa/UoK
 Ci7AqWfG/MnQ4GrGsOx80KJhAFLTmI60vfnUizKtEjL/HJyK5PDyM+VxHz+P/Tkq
 4hqQsHEsll4mAQiKCrrKOOXhn+YC4DhY/5O1EzEfhqfUjI+BFE9iC7LuqQevwKPL
 gytup7eoZjIHMtnKwY1B2ApAqHtodswjHkefcjEcvSlhqGi/BvwuWmeYlFXmA3r0
 YO8fvbYJrwHwJy7CzMb5Rgs2461QGERmXoCsBxLAiqXU9rhpOZ6gKXIjjlYojZ8M
 W0kqbaccTRPuhooFdEQ9RTPSkX7AX2bI0nOoPxfz3YD/siw35YwnUkJqvQbckvJd
 vpPkCL5jt3d9sfO0z1xjSH2ey9bevSReYpCsk+kIZl7V2XoDAW0Nbi0Td3pW4j6x
 dEkg/+sjF+o=
 =0pFF
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Patches for 5.2.0-rc2:

- quorum: Fix crash with rewrite-corrupted and without read-write user
- io_uring: do not use pointer after free
- file-posix: Use fallback path for -EBUSY from FALLOC_FL_PUNCH_HOLE
- iotests: Fix failure on Python 3.9 due to use of a deprecated function
- char-stdio: Fix QMP default for 'signal'

# gpg: Signature made Tue 17 Nov 2020 11:43:17 GMT
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  iotests/081: Test rewrite-corrupted without WRITE
  iotests/081: Filter image format after testdir
  quorum: Require WRITE perm with rewrite-corrupted
  io_uring: do not use pointer after free
  file-posix: allow -EBUSY errors during write zeros on raw block devices
  iotests: Replace deprecated ConfigParser.readfp()
  char-stdio: Fix QMP default for 'signal'

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-11-17 15:58:51 +00:00
Peter Maydell 1c7ab0930a pc,vhost: fixes
Fixes all over the place.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl+zlRgPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpmf8H/0BEjxnINJCN12Te+Mot8K9fjwc0zE0SUuYY
 25LogfJMCfVy0SZk0ZQV9z33GEL5XyMlXQjEpLmlX4d3mOBLcbutI6UVLhu8+Ixj
 89+jFphxIQPDOpA7BnPOD4AJ6TlhbewZ41QBR/J/qv946HayFW9QCAUywuj6H80m
 T3lw0FmPkd6/YupUdUm0pPgJjowckGis+cAa9UkTlqp8jpzFur28N02fE0L6QO3Z
 lR6zsk4yEvsVoeXSkEkmSqZGNcwoQCf4BhmDuD7lBLZ0LBvmd37CCoakStpdnQPH
 Swunmf7Q1H6LRtF7s8ZKXBB/ecVnss3kFTFj5KWx3fJH2SJuHG8=
 =v205
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc,vhost: fixes

Fixes all over the place.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Tue 17 Nov 2020 09:17:12 GMT
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream:
  vhost-user-blk/scsi: Fix broken error handling for socket call
  contrib/libvhost-user: Fix bad printf format specifiers
  hw/i386/acpi-build: Fix maybe-uninitialized error when ACPI hotplug off
  configure: mark vhost-user Linux-only
  vhost-user-blk-server: depend on CONFIG_VHOST_USER
  meson: move vhost_user_blk_server to meson.build
  vhost-user: fix VHOST_USER_ADD/REM_MEM_REG truncation

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

# Conflicts:
#	meson.build
2020-11-17 11:50:11 +00:00
Max Reitz 9ca5b0e842 quorum: Require WRITE perm with rewrite-corrupted
Using rewrite-corrupted means quorum may issue writes to its children
just from receiving read requests from its parents.  Thus, it must take
the WRITE permission when rewrite-corrupted is used.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201113211718.261671-2-mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-11-17 12:38:28 +01:00
Paolo Bonzini bd89f93603 io_uring: do not use pointer after free
Even though only the pointer value is only printed, it is untidy
and Coverity complains.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20201113154102.1460459-1-pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-11-17 12:26:48 +01:00
Maxim Levitsky ece4fa9152 file-posix: allow -EBUSY errors during write zeros on raw block devices
On Linux, fallocate(fd, FALLOC_FL_PUNCH_HOLE) when it is used on a block device,
without O_DIRECT can return -EBUSY if it races with another write to the same page.

Since this is rare and discard is not a critical operation, ignore this error

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20201111153913.41840-2-mlevitsk@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-11-17 12:26:48 +01:00
Chetan Pant 61f3c91a67 nomaintainer: Fix Lesser GPL version number
There is no "version 2" of the "Lesser" General Public License.
It is either "GPL version 2.0" or "Lesser GPL version 2.1".
This patch replaces all occurrences of "Lesser GPL version 2" with
"Lesser GPL version 2.1" in comment section.

This patch contains all the files, whose maintainer I could not get
from ‘get_maintainer.pl’ script.

Signed-off-by: Chetan Pant <chetan4windows@gmail.com>
Message-Id: <20201023124424.20177-1-chetan4windows@gmail.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
[thuth: Adapted exec.c and qdev-monitor.c to new location]
Signed-off-by: Thomas Huth <thuth@redhat.com>
2020-11-15 17:04:40 +01:00
Stefan Hajnoczi e5e856c1eb meson: move vhost_user_blk_server to meson.build
The --enable/disable-vhost-user-blk-server options were implemented in
./configure. There has been confusion about them and part of the problem
is that the shell syntax used for setting the default value is not easy
to read. Move the option over to meson where the conditions are easier
to understand:

  have_vhost_user_blk_server = (targetos == 'linux')

  if get_option('vhost_user_blk_server').enabled()
      if targetos != 'linux'
          error('vhost_user_blk_server requires linux')
      endif
  elif get_option('vhost_user_blk_server').disabled() or not have_system
      have_vhost_user_blk_server = false
  endif

This patch does not change behavior.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201110171121.1265142-2-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2020-11-12 09:19:40 -05:00
shiliyang 5f14f31d2b block: Fix some code style problems, "foo* bar" should be "foo *bar"
There have some code style problems be found when read the block driver code.
So I fixes some problems of this error, ERROR: "foo* bar" should be "foo *bar".

Signed-off-by: Liyang Shi <shiliyang@huawei.com>
Reported-by: Euler Robot <euler.robot@huawei.com>
Message-Id: <3211f389-6d22-46c1-4a16-e6a2ba66f070@huawei.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-11-09 18:42:47 +01:00
Yonggang Luo c63b0201ae block: Fixes nfs compiling error on msys2/mingw
These compiling errors are fixed:
../block/nfs.c:27:10: fatal error: poll.h: No such file or directory
   27 | #include <poll.h>
      |          ^~~~~~~~
compilation terminated.

../block/nfs.c:63:5: error: unknown type name 'blkcnt_t'
   63 |     blkcnt_t st_blocks;
      |     ^~~~~~~~
../block/nfs.c: In function 'nfs_client_open':
../block/nfs.c:550:27: error: 'struct _stat64' has no member named 'st_blocks'
  550 |     client->st_blocks = st.st_blocks;
      |                           ^
../block/nfs.c: In function 'nfs_get_allocated_file_size':
../block/nfs.c:751:41: error: 'struct _stat64' has no member named 'st_blocks'
  751 |     return (task.ret < 0 ? task.ret : st.st_blocks * 512);
      |                                         ^
../block/nfs.c: In function 'nfs_reopen_prepare':
../block/nfs.c:805:31: error: 'struct _stat64' has no member named 'st_blocks'
  805 |         client->st_blocks = st.st_blocks;
      |                               ^
../block/nfs.c: In function 'nfs_get_allocated_file_size':
../block/nfs.c:752:1: error: control reaches end of non-void function [-Werror=return-type]
  752 | }
      | ^

On msys2/mingw, there is no st_blocks in struct _stat64 yet, we disable the usage of it
on msys2/mingw, and create a typedef long long blkcnt_t; for further implementation

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Message-Id: <20201105123116.674-2-luoyonggang@gmail.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-11-09 15:44:21 +01:00
Alberto Garcia 3441ad4bc4 qcow2: Document and enforce the QCowL2Meta invariants
The QCowL2Meta structure is used to store information about a part of
a write request that touches clusters that need changes in their L2
entries. This happens with newly-allocated clusters or subclusters.

This structure has changed a bit since it was first created and its
current documentation is not quite up-to-date.

A write request can span a region consisting of a combination of
clusters of different types, and qcow2_alloc_host_offset() can
repeatedly call handle_copied() and handle_alloc() to add more
clusters to the mix as long as they all are contiguous on the image
file.

Because of this a write request has a list of QCowL2Meta structures,
one for each part of the request that needs changes in the L2
metadata.

Each one of them spans nb_clusters and has two copy-on-write regions
located immediately before and after the middle region touched by that
part of the write request. Even when those regions themselves are
empty their offsets must be correct because they are used to know the
location of the middle region.

This was not always the case but it is not a problem anymore
because the only two places where QCowL2Meta structures are created
(calculate_l2_meta() and qcow2_co_truncate()) ensure that the
copy-on-write regions are correctly defined, and so do assertions like
the ones in perform_cow().

The conditional initialization of the 'written_to' variable is
therefore unnecessary and is removed by this patch.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201007161323.4667-1-berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-11-09 15:44:21 +01:00
AlexChen 3d86af858e block: Remove unused include
The "qemu-common.h" include is not used, remove it.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: AlexChen <alex.chen@huawei.com>
Message-Id: <5F8FFB94.3030209@huawei.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-11-09 15:44:21 +01:00
Peter Maydell 85c3ed4417 pc,pci,vhost,virtio: fixes
Lots of fixes all over the place.
 virtio-mem and virtio-iommu patches are kind of fixes but
 it seems better to just make them behave sanely than
 try to educate users about the limitations ...
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl+i9YMPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpySQH/Ru/sxB9PncR1HsqSf0HC0tt/EMKgyZTXEwQ
 FITcjkCvBDS98a1VUvvZbjzTEDEZNnoUv94MjdLeBoptJ7GtK6nPoI6Ke0p1Zqbe
 mlY2BCb0FpN8FE+mthjAI03mhw6o8Qo/OPtyISQzUxCVVqUHL5TRAVAQdeidoK8n
 RBQ4WogwM/h7wI0d9GGgSxAON8IRQnBYImtzJieBb6zeScwKVFTWI1tqBdOyFN0/
 AhzQiNZuhZ7a1XGJIsxmWB1NK2kcXNJuOF0ANh4coIHR0JzmH3xRy+Jnf5e3dYsw
 LI23DUZPSTJJXAwKPucyTG7RTX8F55N9DVHC9KDRD6Ntq1oreJ4=
 =pcbN
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc,pci,vhost,virtio: fixes

Lots of fixes all over the place.
virtio-mem and virtio-iommu patches are kind of fixes but
it seems better to just make them behave sanely than
try to educate users about the limitations ...

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Wed 04 Nov 2020 18:40:03 GMT
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream: (31 commits)
  contrib/vhost-user-blk: fix get_config() information leak
  block/export: fix vhost-user-blk get_config() information leak
  block/export: make vhost-user-blk config space little-endian
  configure: introduce --enable-vhost-user-blk-server
  libvhost-user: follow QEMU comment style
  vhost-blk: set features before setting inflight feature
  Revert "vhost-blk: set features before setting inflight feature"
  net: Add vhost-vdpa in show_netdevs()
  vhost-vdpa: Add qemu_close in vhost_vdpa_cleanup
  vfio: Don't issue full 2^64 unmap
  virtio-iommu: Set supported page size mask
  vfio: Set IOMMU page size as per host supported page size
  memory: Add interface to set iommu page size mask
  virtio-iommu: Add notify_flag_changed() memory region callback
  virtio-iommu: Add replay() memory region callback
  virtio-iommu: Call memory notifiers in attach/detach
  virtio-iommu: Add memory notifiers for map/unmap
  virtio-iommu: Store memory region in endpoint struct
  virtio-iommu: Fix virtio_iommu_mr()
  hw/smbios: Fix leaked fd in save_opt_one() error path
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-11-05 15:16:43 +00:00
Stefan Hajnoczi f8ffcb2bda block/export: fix vhost-user-blk get_config() information leak
Refuse get_config() requests in excess of sizeof(struct virtio_blk_config).

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201027173528.213464-5-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 16:39:05 -05:00
Stefan Hajnoczi 11f60f7eae block/export: make vhost-user-blk config space little-endian
VIRTIO 1.0 devices have little-endian configuration space. The
vhost-user-blk-server.c code already uses little-endian for virtqueue
processing but not for the configuration space fields. Fix this so the
vhost-user-blk export works on big-endian hosts.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201027173528.213464-4-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 16:39:05 -05:00
Stefan Hajnoczi bc15e44cb2 configure: introduce --enable-vhost-user-blk-server
Make it possible to compile out the vhost-user-blk server. It is enabled
by default on Linux.

Note that vhost-user-server.c depends on libvhost-user, which requires
CONFIG_LINUX. The CONFIG_VHOST_USER dependency was erroneous since that
option controls vhost-user frontends (previously known as "master") and
not device backends (previously known as "slave").

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201027173528.213464-3-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 16:39:05 -05:00
Philippe Mathieu-Daudé a0546a7b6f block/nvme: Fix nvme_submit_command() on big-endian host
The Completion Queue Command Identifier is a 16-bit value,
so nvme_submit_command() is unlikely to work on big-endian
hosts, as the relevant bits are truncated.
Fix by using the correct byte-swap function.

Fixes: bdd6a90a9e ("block: Add VFIO based NVMe driver")
Reported-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201029093306.1063879-25-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:22 +00:00
Philippe Mathieu-Daudé 4b19e9b815 block/nvme: Fix use of write-only doorbells page on Aarch64 arch
qemu_vfio_pci_map_bar() calls mmap(), and mmap(2) states:

  'offset' must be a multiple of the page size as returned
   by sysconf(_SC_PAGE_SIZE).

In commit f68453237b we started to use an offset of 4K which
broke this contract on Aarch64 arch.

Fix by mapping at offset 0, and and accessing doorbells at offset=4K.

Fixes: f68453237b ("block/nvme: Map doorbells pages write-only")
Reported-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-24-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:22 +00:00
Eric Auger 9e13d59884 block/nvme: Align iov's va and size on host page size
Make sure iov's va and size are properly aligned on the
host page size.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-23-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:22 +00:00
Eric Auger f8fd3ebac3 block/nvme: Change size and alignment of prp_list_pages
In preparation of 64kB host page support, let's change the size
and alignment of the prp_list_pages so that the VFIO DMA MAP succeeds
with 64kB host page size. We align on the host page size.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-22-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:22 +00:00
Eric Auger 2387aaced7 block/nvme: Change size and alignment of queue
In preparation of 64kB host page support, let's change the size
and alignment of the queue so that the VFIO DMA MAP succeeds.
We align on the host page size.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-21-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:22 +00:00
Eric Auger 0aecd06049 block/nvme: Change size and alignment of IDENTIFY response buffer
In preparation of 64kB host page support, let's change the size
and alignment of the IDENTIFY command response buffer so that
the VFIO DMA MAP succeeds. We align on the host page size.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-20-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:22 +00:00
Philippe Mathieu-Daudé a652a3ec69 block/nvme: Correct minimum device page size
While trying to simplify the code using a macro, we forgot
the 12-bit shift... Correct that.

Fixes: fad1eb6886 ("block/nvme: Use register definitions from 'block/nvme.h'")
Reported-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-19-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:22 +00:00
Philippe Mathieu-Daudé c8228ac355 block/nvme: Set request_alignment at initialization
Commit bdd6a90a9e ("block: Add VFIO based NVMe driver")
sets the request_alignment in nvme_refresh_limits().
For consistency, also set it during initialization.

Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-18-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé 08d5406798 block/nvme: Simplify nvme_cmd_sync()
As all commands use the ADMIN queue, it is pointless to pass
it as argument each time. Remove the argument, and rename the
function as nvme_admin_cmd_sync() to make this new behavior
clearer.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201029093306.1063879-17-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé 52b75ea8ec block/nvme: Simplify ADMIN queue access
We don't need to dereference from BDRVNVMeState each time.
Use a NVMeQueuePair pointer on the admin queue.
The nvme_init() becomes easier to review, matching the style
of nvme_add_io_queue().

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-16-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé 3c363c073e block/nvme: Correctly initialize Admin Queue Attributes
From the specification chapter 3.1.8 "AQA - Admin Queue Attributes"
the Admin Submission Queue Size field is a 0’s based value:

  Admin Submission Queue Size (ASQS):

    Defines the size of the Admin Submission Queue in entries.
    Enabling a controller while this field is cleared to 00h
    produces undefined results. The minimum size of the Admin
    Submission Queue is two entries. The maximum size of the
    Admin Submission Queue is 4096 entries.
    This is a 0’s based value.

This bug has never been hit because the device initialization
uses a single command synchronously :)

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-15-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé 76a24781cc block/nvme: Use definitions instead of magic values in add_io_queue()
Replace magic values by definitions, and simplifiy since the
number of queues will never reach 64K.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-14-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé dfa9c6c656 block/nvme: Make nvme_init_queue() return boolean indicating error
Just for consistency, following the example documented since
commit e3fe3988d7 ("error: Document Error API usage rules"),
return a boolean value indicating an error is set or not.
Directly pass errp as the local_err is not requested in our
case. This simplifies a bit nvme_create_queue_pair().

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-12-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé 7a5f00dde3 block/nvme: Make nvme_identify() return boolean indicating error
Just for consistency, following the example documented since
commit e3fe3988d7 ("error: Document Error API usage rules"),
return a boolean value indicating an error is set or not.
Directly pass errp as the local_err is not requested in our
case.

Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201029093306.1063879-11-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé 1b539bd6db block/nvme: Use unsigned integer for queue counter/size
We can not have negative queue count/size/index, use unsigned type.
Rename 'nr_queues' as 'queue_count' to match the spec naming.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-10-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé 3214b0f094 block/nvme: Move definitions before structure declarations
To be able to use some definitions in structure declarations,
move them earlier. No logical change.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-9-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé 6e1e9ff2d3 block/nvme: Trace queue pair creation/deletion
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-8-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé 51e98b6d21 block/nvme: Improve nvme_free_req_queue_wait() trace information
What we want to trace is the block driver state and the queue index.

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-7-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé 1c914cd120 block/nvme: Trace nvme_poll_queue() per queue
As we want to enable multiple queues, report the event
in each nvme_poll_queue() call, rather than once in
the callback calling nvme_poll_queues().

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-6-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé 15b2260bef block/nvme: Trace controller capabilities
Controllers have different capabilities and report them in the
CAP register. We are particularly interested by the page size
limits.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-5-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé 58ad6ae0cb block/nvme: Report warning with warn_report()
Instead of displaying warning on stderr, use warn_report()
which also displays it on the monitor.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-4-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé 8526e39e99 block/nvme: Use hex format to display offset in trace events
Use the same format used for the hw/vfio/ trace events.

Suggested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-3-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:20 +00:00
AlexChen c9eb2f3e38 block/vvfat: Fix bad printf format specifiers
We should use printf format specifier "%u" instead of "%d" for
argument of type "unsigned int".
In addition, fix two error format problems found by checkpatch.pl:
ERROR: space required after that ',' (ctx:VxV)
+        fprintf(stderr,"%s attributes=0x%02x begin=%u size=%d\n",
                       ^
ERROR: line over 90 characters
+        fprintf(stderr, "%d, %s (%u, %d)\n", i, commit->path ? commit->path : "(null)", commit->param.rename.cluster, commit->action);

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Message-Id: <5FA12620.6030705@huawei.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-11-03 16:24:56 +01:00
Eric Blake dbc7b01492 nbd: Add 'qemu-nbd -A' to expose allocation depth
Allow the server to expose an additional metacontext to be requested
by savvy clients.  qemu-nbd adds a new option -A to expose the
qemu:allocation-depth metacontext through NBD_CMD_BLOCK_STATUS; this
can also be set via QMP when using block-export-add.

qemu as client is hacked into viewing the key aspects of this new
context by abusing the already-experimental x-dirty-bitmap option to
collapse all depths greater than 2, which results in a tri-state value
visible in the output of 'qemu-img map --output=json' (yes, that means
x-dirty-bitmap is now a bit of a misnomer, but I didn't feel like
renaming it as it would introduce a needless break of back-compat,
even though we make no compat guarantees with x- members):

unallocated (depth 0) => "zero":false, "data":true
local (depth 1)       => "zero":false, "data":false
backing (depth 2+)    => "zero":true,  "data":true

libnbd as client is probably a nicer way to get at the information
without having to decipher such hacks in qemu as client. ;)

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201027050556.269064-11-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2020-10-30 15:22:00 -05:00
Eric Blake a92b1b065e block: Return depth level during bdrv_is_allocated_above
When checking for allocation across a chain, it's already easy to
count the depth within the chain at which the allocation is found.
Instead of throwing that information away, return it to the caller.
Existing callers only cared about allocated/non-allocated, but having
a depth available will be used by NBD in the next patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201027050556.269064-9-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: rebase to master]
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-10-30 15:21:23 -05:00
Greg Kurz 1a6d3bd229 block: End quiescent sections when a BDS is deleted
If a BDS gets deleted during blk_drain_all(), it might miss a
call to bdrv_do_drained_end(). This means missing a call to
aio_enable_external() and the AIO context remains disabled for
ever. This can cause a device to become irresponsive and to
disrupt the guest execution, ie. hang, loop forever or worse.

This scenario is quite easy to encounter with virtio-scsi
on POWER when punching multiple blockdev-create QMP commands
while the guest is booting and it is still running the SLOF
firmware. This happens because SLOF disables/re-enables PCI
devices multiple times via IO/MEM/MASTER bits of PCI_COMMAND
register after the initial probe/feature negotiation, as it
tends to work with a single device at a time at various stages
like probing and running block/network bootloaders without
doing a full reset in-between. This naturally generates many
dataplane stops and starts, and thus many drain sections that
can race with blockdev_create_run(). In the end, SLOF bails
out.

It is somehow reproducible on x86 but it requires to generate
articial dataplane start/stop activity with stop/cont QMP
commands. In this case, seabios ends up looping for ever,
waiting for the virtio-scsi device to send a response to
a command it never received.

Add a helper that pairs all previously called bdrv_do_drained_begin()
with a bdrv_do_drained_end() and call it from bdrv_close().
While at it, update the "/bdrv-drain/graph-change/drain_all"
test in test-bdrv-drain so that it can catch the issue.

BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1874441
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160346526998.272601.9045392804399803158.stgit@bahia.lan>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-27 15:26:20 +01:00
Alberto Garcia 46cd1e8a47 qcow2: Skip copy-on-write when allocating a zero cluster
Since commit c8bb23cbdb when a write
request results in a new allocation QEMU first tries to see if the
rest of the cluster outside the written area contains only zeroes.

In that case, instead of doing a normal copy-on-write operation and
writing explicit zero buffers to disk, the code zeroes the whole
cluster efficiently using pwrite_zeroes() with BDRV_REQ_NO_FALLBACK.

This improves performance very significantly but it only happens when
we are writing to an area that was completely unallocated before. Zero
clusters (QCOW2_CLUSTER_ZERO_*) are treated like normal clusters and
are therefore slower to allocate.

This happens because the code uses bdrv_is_allocated_above() rather
bdrv_block_status_above(). The former is not as accurate for this
purpose but it is faster. However in the case of qcow2 the underlying
call does already report zero clusters just fine so there is no reason
why we cannot use that information.

After testing 4KB writes on an image that only contains zero clusters
this patch results in almost five times more IOPS.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <6d77cab968c501c44d6e1089b9bc91b04170b49e.1603731354.git.berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-27 15:26:20 +01:00
Alberto Garcia d40f4a565a qcow2: Report BDRV_BLOCK_ZERO more accurately in bdrv_co_block_status()
If a BlockDriverState supports backing files but has none then any
unallocated area reads back as zeroes.

bdrv_co_block_status() is only reporting this is if want_zero is true,
but this is an inexpensive test and there is no reason not to do it in
all cases.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <66fa0914a0e2b727ab6d1b63ca773d7cd29a9a9e.1603731354.git.berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-27 15:26:20 +01:00
Peter Maydell a95e0396c8 * fix --disable-tcg builds (Claudio)
* Fixes for macOS --enable-modules build and OpenBSD curses/iconv detection (myself)
 * Start preparing for meson 0.56 (myself)
 * Move directory configuration to meson (myself)
 * Start untangling qemu_init (myself)
 * Windows fixes (Sunil)
 * Remove -no-kbm (Thomas)
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl+WrxEUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNQAggAqfucqEQvz6s+DCPv2u572diyMvhe
 Y7vmaQF0qYKoAvy5OLqGlqXVsn8lwf19zJWo9Z7k4qNefWl84ii0J/kEmnolzTGq
 7Z0CRSnGbNQy9YedYXuymaR3E0VY+6lsPnzIpufQISzQRdjzT8OQ51DMAhc04oQl
 saXsts7y+om+tzvW2JFGtNsfFRUjcRKqjIAVfwneBXFW9TRD2epvYxz/S0o+XJwF
 eSiINvTqDxxPyy6XJykC46xf/TTfReHv6fQgTn7Jw3TQuo4m7qXLi5Vj8W1erZJv
 t3xhZNabt813T6ztNcAAuJ0srIn55Ac7Fuq3/1ecgeVD08ntmabe4WhKRg==
 =931x
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging

* fix --disable-tcg builds (Claudio)
* Fixes for macOS --enable-modules build and OpenBSD curses/iconv detection (myself)
* Start preparing for meson 0.56 (myself)
* Move directory configuration to meson (myself)
* Start untangling qemu_init (myself)
* Windows fixes (Sunil)
* Remove -no-kbm (Thomas)

# gpg: Signature made Mon 26 Oct 2020 11:12:17 GMT
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini-gitlab/tags/for-upstream:
  machine: move SMP initialization from vl.c
  machine: move UP defaults to class_base_init
  machine: remove deprecated -machine enforce-config-section option
  win32: boot broken when bind & data dir are the same
  WHPX: Fix WHPX build break
  configure: move install_blobs from configure to meson
  configure: remove unused variable from config-host.mak
  configure: move directory options from config-host.mak to meson
  configure: allow configuring localedir
  Makefile: separate meson rerun from the rest of the ninja invocation
  Remove deprecated -no-kvm option
  replay: do not build if TCG is not available
  qtest: unbreak non-TCG builds in bios-tables-test
  hw/core/qdev-clock: add a reference on aliased clocks
  do not use colons in test names
  meson: rewrite curses/iconv test
  build: fix macOS --enable-modules build

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-10-26 15:49:11 +00:00
Vladimir Sementsov-Ogievskiy 7e7e510077 block/io: fix bdrv_is_allocated_above
bdrv_is_allocated_above wrongly handles short backing files: it reports
after-EOF space as UNALLOCATED which is wrong, as on read the data is
generated on the level of short backing file (if all overlays have
unallocated areas at that place).

Reusing bdrv_common_block_status_above fixes the issue and unifies code
path.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20200924194003.22080-5-vsementsov@virtuozzo.com
[Fix s/has/have/ as suggested by Eric Blake. Fix s/area/areas/.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Vladimir Sementsov-Ogievskiy 624f27bbe9 block/io: bdrv_common_block_status_above: support bs == base
We are going to reuse bdrv_common_block_status_above in
bdrv_is_allocated_above. bdrv_is_allocated_above may be called with
include_base == false and still bs == base (for ex. from img_rebase()).

So, support this corner case.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20200924194003.22080-4-vsementsov@virtuozzo.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Vladimir Sementsov-Ogievskiy 3555a43261 block/io: bdrv_common_block_status_above: support include_base
In order to reuse bdrv_common_block_status_above in
bdrv_is_allocated_above, let's support include_base parameter.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20200924194003.22080-3-vsementsov@virtuozzo.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Vladimir Sementsov-Ogievskiy 67c095c8b8 block/io: fix bdrv_co_block_status_above
bdrv_co_block_status_above has several design problems with handling
short backing files:

1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
which produces these after-EOF zeros is inside requested backing
sequence.

2. With want_zero=false, it may return pnum=0 prior to actual EOF,
because of EOF of short backing file.

Fix these things, making logic about short backing files clearer.

With fixed bdrv_block_status_above we also have to improve is_zero in
qcow2 code, otherwise iotest 154 will fail, because with this patch we
stop to merge zeros of different types (produced by fully unallocated
in the whole backing chain regions vs produced by short backing files).

Note also, that this patch leaves for another day the general problem
around block-status: misuse of BDRV_BLOCK_ALLOCATED as is-fs-allocated
vs go-to-backing.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20200924194003.22080-2-vsementsov@virtuozzo.com
[Fix s/comes/come/ as suggested by Eric Blake
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi d9b495f9c6 block/export: add vhost-user-blk multi-queue support
Allow the number of queues to be configured using --export
vhost-user-blk,num-queues=N. This setting should match the QEMU --device
vhost-user-blk-pci,num-queues=N setting but QEMU vhost-user-blk.c lowers
its own value if the vhost-user-blk backend offers fewer queues than
QEMU.

The vhost-user-blk-server.c code is already capable of multi-queue. All
virtqueue processing runs in the same AioContext. No new locking is
needed.

Add the num-queues=N option and set the VIRTIO_BLK_F_MQ feature bit.
Note that the feature bit only announces the presence of the num_queues
configuration space field. It does not promise that there is more than 1
virtqueue, so we can set it unconditionally.

I tested multi-queue by running a random read fio test with numjobs=4 on
an -smp 4 guest. After the benchmark finished the guest /proc/interrupts
file showed activity on all 4 virtio-blk MSI-X. The /sys/block/vda/mq/
directory shows that Linux blk-mq has 4 queues configured.

An automated test is included in the next commit.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20201001144604.559733-2-stefanha@redhat.com
[Fixed accidental tab characters as suggested by Markus Armbruster
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi f51d23c80a block/export: add iothread and fixed-iothread options
Make it possible to specify the iothread where the export will run. By
default the block node can be moved to other AioContexts later and the
export will follow. The fixed-iothread option forces strict behavior
that prevents changing AioContext while the export is active. See the
QAPI docs for details.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200929125516.186715-5-stefanha@redhat.com
[Fix stray '#' character in block-export.json and add missing "(since:
5.2)" as suggested by Eric Blake.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi cbc20bfb8f block: move block exports to libblockdev
Block exports are used by softmmu, qemu-storage-daemon, and qemu-nbd.
They are not used by other programs and are not otherwise needed in
libblock.

Undo the recent move of blockdev-nbd.c from blockdev_ss into block_ss.
Since bdrv_close_all() (libblock) calls blk_exp_close_all()
(libblockdev) a stub function is required..

Make qemu-nbd.c use signal handling utility functions instead of
duplicating the code. This helps because os-posix.c is in libblockdev
and it depends on a qemu_system_killed() symbol that qemu-nbd.c lacks.
Once we use the signal handling utility functions we also end up
providing the necessary symbol.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20200929125516.186715-4-stefanha@redhat.com
[Fixed s/ndb/nbd/ typo in commit description as suggested by Eric Blake
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi 3a213f83d9 util/vhost-user-server: use static library in meson.build
Don't compile contrib/libvhost-user/libvhost-user.c again. Instead build
the static library once and then reuse it throughout QEMU.

Also switch from CONFIG_LINUX to CONFIG_VHOST_USER, which is what the
vhost-user tools (vhost-user-gpu, etc) do.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200924151549.913737-14-stefanha@redhat.com
[Added CONFIG_LINUX again because libvhost-user doesn't build on macOS.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi 80a06cc52b util/vhost-user-server: move header to include/
Headers used by other subsystems are located in include/. Also add the
vhost-user-server and vhost-user-blk-server headers to MAINTAINERS.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200924151549.913737-13-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi 90fc91d50b block/export: convert vhost-user-blk server to block export API
Use the new QAPI block exports API instead of defining our own QOM
objects.

This is a large change because the lifecycle of VuBlockDev needs to
follow BlockExportDriver. QOM properties are replaced by QAPI options
objects.

VuBlockDev is renamed VuBlkExport and contains a BlockExport field.
Several fields can be dropped since BlockExport already has equivalents.

The file names and meson build integration will be adjusted in a future
patch. libvhost-user should probably be built as a static library that
is linked into QEMU instead of as a .c file that results in duplicate
compilation.

The new command-line syntax is:

  $ qemu-storage-daemon \
      --blockdev file,node-name=drive0,filename=test.img \
      --export vhost-user-blk,node-name=drive0,id=export0,unix-socket=/tmp/vhost-user-blk.sock

Note that unix-socket is optional because we may wish to accept chardevs
too in the future.

Markus noted that supported address families are not explicit in the
QAPI schema. It is unlikely that support for more address families will
be added since file descriptor passing is required and few address
families support it. If a new address family needs to be added, then the
QAPI 'features' syntax can be used to advertize them.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20200924151549.913737-12-stefanha@redhat.com
[Skip test on big-endian host architectures because this device doesn't
support them yet (as already mentioned in a code comment).
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi 0534b1b227 block/export: report flush errors
Propagate the flush return value since errors are possible.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200924151549.913737-11-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi 7185c85776 util/vhost-user-server: rework vu_client_trip() coroutine lifecycle
The vu_client_trip() coroutine is leaked during AioContext switching. It
is also unsafe to destroy the vu_dev in panic_cb() since its callers
still access it in some cases.

Rework the lifecycle to solve these safety issues.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200924151549.913737-10-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi 47ba680466 util/vhost-user-server: drop unused DevicePanicNotifier
The device panic notifier callback is not used. Drop it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200924151549.913737-7-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi df6af7ce77 block/export: consolidate request structs into VuBlockReq
Only one struct is needed per request. Drop req_data and the separate
VuBlockReq instance. Instead let vu_queue_pop() allocate everything at
once.

This fixes the req_data memory leak in vu_block_virtio_process_req().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200924151549.913737-6-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Coiby Xu 3578389bcf block/export: vhost-user block device backend server
By making use of libvhost-user, block device drive can be shared to
the connected vhost-user client. Only one client can connect to the
server one time.

Since vhost-user-server needs a block drive to be created first, delay
the creation of this object.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20200918080912.321299-6-coiby.xu@gmail.com
[Shorten "vhost_user_blk_server" string to "vhost_user_blk" to avoid the
following compiler warning:
../block/export/vhost-user-blk-server.c:178:50: error: ‘%s’ directive output truncated writing 21 bytes into a region of size 20 [-Werror=format-truncation=]
and fix "Invalid size %ld ..." ssize_t format string arguments for
32-bit hosts.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Philippe Mathieu-Daudé f25e7ab2b0 block/nvme: Add driver statistics for access alignment and hw errors
Keep statistics of some hardware errors, and number of
aligned/unaligned I/O accesses.

QMP example booting a full RHEL 8.3 aarch64 guest:

{ "execute": "query-blockstats" }
{
    "return": [
        {
            "device": "",
            "node-name": "drive0",
            "stats": {
                "flush_total_time_ns": 6026948,
                "wr_highest_offset": 3383991230464,
                "wr_total_time_ns": 807450995,
                "failed_wr_operations": 0,
                "failed_rd_operations": 0,
                "wr_merged": 3,
                "wr_bytes": 50133504,
                "failed_unmap_operations": 0,
                "failed_flush_operations": 0,
                "account_invalid": false,
                "rd_total_time_ns": 1846979900,
                "flush_operations": 130,
                "wr_operations": 659,
                "rd_merged": 1192,
                "rd_bytes": 218244096,
                "account_failed": false,
                "idle_time_ns": 2678641497,
                "rd_operations": 7406,
            },
            "driver-specific": {
                "driver": "nvme",
                "completion-errors": 0,
                "unaligned-accesses": 2959,
                "aligned-accesses": 4477
            },
            "qdev": "/machine/peripheral-anon/device[0]/virtio-backend"
        }
    ]
}

Suggested-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20201001162939.1567915-1-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Claudio Fontana 9b1c911654 replay: do not build if TCG is not available
this fixes non-TCG builds broken recently by replay reverse debugging.

Stub the needed functions in stub/, splitting roughly between functions
needed only by system emulation, by system emulation and tools,
and by everyone.  This includes duplicating some code in replay/, and
puts the logic for non-replay related events in the replay/ module (+
the stubs), so this should be revisited in the future.

Surprisingly, only _one_ qtest was affected by this, ide-test.c, which
resulted in a buzz as the bh events were never delivered, and the bh
never executed.

Many other subsystems _should_ have been affected.

This fixes the immediate issue, however a better way to group replay
functionality to TCG-only code could be developed in the long term.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20201013192123.22632-4-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-22 11:53:54 -04:00
Daniel P. Berrangé e1c4269763 block: deprecate the sheepdog block driver
This thread from a little over a year ago:

  http://lists.wpkg.org/pipermail/sheepdog/2019-March/thread.html

states that sheepdog is no longer actively developed. The only mentioned
users are some companies who are said to have it for legacy reasons with
plans to replace it by Ceph. There is talk about cutting out existing
features to turn it into a simple demo of how to write a distributed
block service. There is no evidence of anyone working on that idea:

  https://github.com/sheepdog/sheepdog/commits/master

No real commits to git since Jan 2018, and before then just some minor
technical debt cleanup.

There is essentially no activity on the mailing list aside from
patches to QEMU that get CC'd due to our MAINTAINERS entry.

Fedora packages for sheepdog failed to build from upstream source
because of the more strict linker that no longer merges duplicate
global symbols. Fedora patches it to add the missing "extern"
annotations and presumably other distros do to, but upstream source
remains broken.

There is only basic compile testing, no functional testing of the
driver.

Since there are no build pre-requisites the sheepdog driver is currently
enabled unconditionally. This would result in configure issuing a
deprecation warning by default for all users. Thus the configure default
is changed to disable it, requiring users to pass --enable-sheepdog to
build the driver.

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20201002113243.2347710-3-berrange@redhat.com>
Reviewed-by: Neal Gompa <ngompa13@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-15 16:06:28 +02:00
Elena Afanasova 5b4c95d0a3 block/blkdebug: fix memory leak
Spotted by PVS-Studio

Signed-off-by: Elena Afanasova <eafanasova@gmail.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1e903f928eb3da332cc95e2a6f87243bd9fe66e4.camel@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-10-13 13:33:46 +02:00
Christian Borntraeger cd466702f0 vmdk: fix maybe uninitialized warnings
Fedora 32 gcc 10 seems to give false positives:

Compiling C object libblock.fa.p/block_vmdk.c.o
../block/vmdk.c: In function ‘vmdk_parse_extents’:
../block/vmdk.c:587:5: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  587 |     g_free(extent->l1_table);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~
../block/vmdk.c:754:17: note: ‘extent’ was declared here
  754 |     VmdkExtent *extent;
      |                 ^~~~~~
../block/vmdk.c:620:11: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  620 |     ret = vmdk_init_tables(bs, extent, errp);
      |           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../block/vmdk.c:598:17: note: ‘extent’ was declared here
  598 |     VmdkExtent *extent;
      |                 ^~~~~~
../block/vmdk.c:1178:39: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
 1178 |             extent->flat_start_offset = flat_offset << 9;
      |             ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
../block/vmdk.c: In function ‘vmdk_open_vmdk4’:
../block/vmdk.c:581:22: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  581 |     extent->l2_cache =
      |     ~~~~~~~~~~~~~~~~~^
  582 |         g_malloc(extent->entry_size * extent->l2_size * L2_CACHE_SIZE);
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../block/vmdk.c:872:17: note: ‘extent’ was declared here
  872 |     VmdkExtent *extent;
      |                 ^~~~~~
../block/vmdk.c: In function ‘vmdk_open’:
../block/vmdk.c:620:11: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  620 |     ret = vmdk_init_tables(bs, extent, errp);
      |           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../block/vmdk.c:598:17: note: ‘extent’ was declared here
  598 |     VmdkExtent *extent;
      |                 ^~~~~~
cc1: all warnings being treated as errors
make: *** [Makefile.ninja:884: libblock.fa.p/block_vmdk.c.o] Error 1

fix them by assigning a default value.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Fam Zheng <fam@euphon.net>
Message-Id: <20200930155859.303148-2-borntraeger@de.ibm.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-10-13 13:33:45 +02:00
Vladimir Sementsov-Ogievskiy 99d72dba1c block/nbd: nbd_co_reconnect_loop(): don't connect if drained
In a recent commit 12c75e20a2 we've improved
nbd_co_reconnect_loop() to not make drain wait for additional sleep.
Similarly, we shouldn't try to connect, if previous sleep was
interrupted by drain begin, otherwise drain_begin will have to wait for
the whole connection attempt.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200903190301.367620-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-10-09 15:04:32 -05:00
Vladimir Sementsov-Ogievskiy 46f56631b5 block/nbd: fix reconnect-delay
reconnect-delay has a design flaw: we handle it in the same loop where
we do connection attempt. So, reconnect-delay may be exceeded by
unpredictable time of connection attempt.

Let's instead use separate timer.

How to reproduce the bug:

1. Create an image on node1:
   qemu-img create -f qcow2 xx 100M

2. Start NBD server on node1:
   qemu-nbd xx

3. On node2 start qemu-io:

./build/qemu-io --image-opts \
driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=15

4. Type 'read 0 512' in qemu-io interface to check that connection
   works

Be careful: you should make steps 5-7 in a short time, less than 15
seconds.

5. Kill nbd server on node1

6. Run 'read 0 512' in qemu-io interface again, to be sure that nbd
client goes to reconnect loop.

7. On node1 run the following command

   sudo iptables -A INPUT -p tcp --dport 10809 -j DROP

This will make the connect() call of qemu-io at node2 take a long time.

And you'll see that read command in qemu-io will hang for a long time,
more than 15 seconds specified by reconnect-delay parameter. It's the
bug.

8. Don't forget to drop iptables rule on node1:

   sudo iptables -D INPUT -p tcp --dport 10809 -j DROP

Important note: Step [5] is necessary to reproduce _this_ bug. If we
miss step [5], the read command (step 6) will hang for a long time and
this commit doesn't help, because there will be not long connect() to
unreachable host, but long sendmsg() to unreachable host, which should
be fixed by enabling and adjusting keep-alive on the socket, which is a
thing for further patch set.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200903190301.367620-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-10-09 15:04:32 -05:00
Vladimir Sementsov-Ogievskiy 8a509afd72 block/nbd: correctly use qio_channel_detach_aio_context when needed
Don't use nbd_client_detach_aio_context() driver handler where we want
to finalize the connection. We should directly use
qio_channel_detach_aio_context() in such cases. Driver handler may (and
will) contain another things, unrelated to the qio channel.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200903190301.367620-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-10-09 15:04:32 -05:00
Vladimir Sementsov-Ogievskiy 8c517de24a block/nbd: fix drain dead-lock because of nbd reconnect-delay
We pause reconnect process during drained section. So, if we have some
requests, waiting for reconnect we should cancel them, otherwise they
deadlock the drained section.

How to reproduce:

1. Create an image:
   qemu-img create -f qcow2 xx 100M

2. Start NBD server:
   qemu-nbd xx

3. Start vm with second nbd disk on node2, like this:

  ./build/x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \
     file=/work/images/cent7.qcow2 -drive \
     driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=60 \
     -vnc :0 -m 2G -enable-kvm -vga std

4. Access the vm through vnc (or some other way?), and check that NBD
   drive works:

   dd if=/dev/sdb of=/dev/null bs=1M count=10

   - the command should succeed.

5. Now, kill the nbd server, and run dd in the guest again:

   dd if=/dev/sdb of=/dev/null bs=1M count=10

Now Qemu is trying to reconnect, and dd-generated requests are waiting
for the connection (they will wait up to 60 seconds (see
reconnect-delay option above) and than fail). But suddenly, vm may
totally hang in the deadlock. You may need to increase reconnect-delay
period to catch the dead-lock.

VM doesn't respond because drain dead-lock happens in cpu thread with
global mutex taken. That's not good thing by itself and is not fixed
by this commit (true way is using iothreads). Still this commit fixes
drain dead-lock itself.

Note: probably, we can instead continue to reconnect during drained
section. To achieve this, we may move negotiation to the connect thread
to make it independent of bs aio context. But expanding drained section
doesn't seem good anyway. So, let's now fix the bug the simplest way.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200903190301.367620-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-10-09 15:04:32 -05:00
Peter Maydell f2687fdb75 * Reverse debugging (Pavel)
* CFLAGS cleanup (Paolo)
 * ASLR fix (Mark)
 * cpus.c refactoring (Claudio)
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl98EB0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOCsQf9G7EUAK1zcEOx20LtDdXFrk4tjsRp
 S83OGdihWe8SM+XiY9BfqsBbXdByqF+SitePOV3feGK0mOP5vtJIL7/2DLrtFTeF
 wOeARRA9ePVb7hcL5oXAQeE3bXrX8wq8Qtw9xAoHdw5JAEVmKIEJS6AL5Eu3M2Fh
 pvdBoV84pOm2/ARS3eRstRyW8gCC8rdLDlNsVDtCbYdNVq+VdkzR0l5Phc8JDx1M
 Qjdl1KpN6ZkuN8M6tnaQNTb9IUVu5c1tu5jdR6JdLUqAWp1wYZJ6r2jSatZWfLR3
 H+gzFsDoLPfCjZ3IhfZyvzF5leSZmdbFfzI0tHS1UJ/ZZYjutDvlPlbyYA==
 =Jys5
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging

* Reverse debugging (Pavel)
* CFLAGS cleanup (Paolo)
* ASLR fix (Mark)
* cpus.c refactoring (Claudio)

# gpg: Signature made Tue 06 Oct 2020 07:35:09 BST
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini-gitlab/tags/for-upstream: (37 commits)
  tests/acceptance: add reverse debugging test
  replay: create temporary snapshot at debugger connection
  replay: describe reverse debugging in docs/replay.txt
  gdbstub: add reverse continue support in replay mode
  gdbstub: add reverse step support in replay mode
  replay: flush rr queue before loading the vmstate
  replay: implement replay-seek command
  replay: introduce breakpoint at the specified step
  replay: introduce info hmp/qmp command
  qapi: introduce replay.json for record/replay-related stuff
  migration: introduce icount field for snapshots
  qcow2: introduce icount field for snapshots
  replay: provide an accessor for rr filename
  replay: don't record interrupt poll
  configure: don't enable ASLR for --enable-debug Windows builds
  configure: consistently pass CFLAGS/CXXFLAGS/LDFLAGS to meson
  configure: do not clobber environment CFLAGS/CXXFLAGS/LDFLAGS
  dtc: Convert Makefile bits to meson bits
  slirp: Convert Makefile bits to meson bits
  accel/tcg: use current_machine as it is always set for softmmu
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-10-06 15:04:10 +01:00
Pavel Dovgalyuk b39847a505 migration: introduce icount field for snapshots
Saving icount as a parameters of the snapshot allows navigation between
them in the execution replay scenario.
This information can be used for finding a specific snapshot for proceeding
the recorded execution to the specific moment of the time.
E.g., 'reverse step' action (introduced in one of the following patches)
needs to load the nearest snapshot which is prior to the current moment
of time.
This patch also updates snapshot test which verifies qemu monitor output.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Acked-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>

--

v4 changes:
 - squashed format update with test output update
v7 changes:
 - introduced the spaces between the fields in snapshot info output
 - updated the test to match new field widths
Message-Id: <160174518865.12451.14327573383978752463.stgit@pasha-ThinkPad-X280>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-06 08:34:49 +02:00
Pavel Dovgalyuk bbacffc5f7 qcow2: introduce icount field for snapshots
This patch introduces the icount field for saving within the snapshot.
It is required for navigation between the snapshots in record/replay mode.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Acked-by: Kevin Wolf <kwolf@redhat.com>

--

v7 changes:
 - also fix the test which checks qcow2 snapshot extra data
Message-Id: <160174518284.12451.2301137308458777398.stgit@pasha-ThinkPad-X280>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-06 08:34:49 +02:00
Vladimir Sementsov-Ogievskiy b33b354f3a block/io: refactor save/load vmstate
Like for read/write in a previous commit, drop extra indirection layer,
generate directly bdrv_readv_vmstate() and bdrv_writev_vmstate().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200924185414.28642-8-vsementsov@virtuozzo.com>
2020-10-05 10:59:42 +01:00
Vladimir Sementsov-Ogievskiy fae2681add block: drop bdrv_prwv
Now that we are not maintaining boilerplate code for coroutine
wrappers, there is no more sense in keeping the extra indirection layer
of bdrv_prwv().  Let's drop it and instead generate pure bdrv_preadv()
and bdrv_pwritev().

Currently, bdrv_pwritev() and bdrv_preadv() are returning bytes on
success, auto generated functions will instead return zero, as their
_co_ prototype. Still, it's simple to make the conversion safe: the
only external user of bdrv_pwritev() is test-bdrv-drain, and it is
comfortable enough with bdrv_co_pwritev() instead. So prototypes are
moved to local block/coroutines.h. Next, the only internal use is
bdrv_pread() and bdrv_pwrite(), which are modified to return bytes on
success.

Of course, it would be great to convert bdrv_pread() and bdrv_pwrite()
to return 0 on success. But this requires audit (and probably
conversion) of all their users, let's leave it for another day
refactoring.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200924185414.28642-7-vsementsov@virtuozzo.com>
2020-10-05 10:59:42 +01:00
Vladimir Sementsov-Ogievskiy 9bb4b066cc block: generate coroutine-wrapper code
Use code generation implemented in previous commit to generated
coroutine wrappers in block.c and block/io.c

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200924185414.28642-6-vsementsov@virtuozzo.com>
2020-10-05 10:59:42 +01:00
Vladimir Sementsov-Ogievskiy aaaa20b69b scripts: add block-coroutine-wrapper.py
We have a very frequent pattern of creating a coroutine from a function
with several arguments:

  - create a structure to pack parameters
  - create _entry function to call original function taking parameters
    from struct
  - do different magic to handle completion: set ret to NOT_DONE or
    EINPROGRESS or use separate bool field
  - fill the struct and create coroutine from _entry function with this
    struct as a parameter
  - do coroutine enter and BDRV_POLL_WHILE loop

Let's reduce code duplication by generating coroutine wrappers.

This patch adds scripts/block-coroutine-wrapper.py together with some
friends, which will generate functions with declared prototypes marked
by the 'generated_co_wrapper' specifier.

The usage of new code generation is as follows:

    1. define the coroutine function somewhere

        int coroutine_fn bdrv_co_NAME(...) {...}

    2. declare in some header file

        int generated_co_wrapper bdrv_NAME(...);

       with same list of parameters (generated_co_wrapper is
       defined in "include/block/block.h").

    3. Make sure the block_gen_c declaration in block/meson.build
       mentions the file with your marker function.

Still, no function is now marked, this work is for the following
commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200924185414.28642-5-vsementsov@virtuozzo.com>
[Added encoding='utf-8' to open() calls as requested by Vladimir. Fixed
typo and grammar issues pointed out by Eric Blake. Removed clang-format
dependency that caused build test issues.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-05 10:59:06 +01:00
Vladimir Sementsov-Ogievskiy 21c2283ebc block: declare some coroutine functions in block/coroutines.h
We are going to keep coroutine-wrappers code (structure-packing
parameters, BDRV_POLL wrapper functions) in separate auto-generated
files. So, we'll need a header with declaration of original _co_
functions, for those which are static now. As well, we'll need
declarations for wrapper functions. Do these declarations now, as a
preparation step.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200924185414.28642-4-vsementsov@virtuozzo.com>
2020-10-05 09:35:52 +01:00
Vladimir Sementsov-Ogievskiy f9e694cb32 block/io: refactor coroutine wrappers
Most of our coroutine wrappers already follow this convention:

We have 'coroutine_fn bdrv_co_<something>(<normal argument list>)' as
the core function, and a wrapper 'bdrv_<something>(<same argument
list>)' which does parameter packing and calls bdrv_run_co().

The only outsiders are the bdrv_prwv_co and
bdrv_common_block_status_above wrappers. Let's refactor them to behave
as the others, it simplifies further conversion of coroutine wrappers.

This patch adds an indirection layer, but it will be compensated by
a further commit, which will drop bdrv_co_prwv together with the
is_write logic, to keep the read and write paths separate.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200924185414.28642-3-vsementsov@virtuozzo.com>
2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé eefffb0244 block/nvme: Replace magic value by SCALE_MS definition
Use self-explicit SCALE_MS definition instead of magic value
(missed in similar commit e4f310fe7f).

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200922083821.578519-7-philmd@redhat.com>
2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé fad1eb6886 block/nvme: Use register definitions from 'block/nvme.h'
Use the NVMe register definitions from "block/nvme.h" which
ease a bit reviewing the code while matching the datasheet.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200922083821.578519-6-philmd@redhat.com>
2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé 9406e0d97e block/nvme: Drop NVMeRegs structure, directly use NvmeBar
NVMeRegs only contains NvmeBar. Simplify the code by using NvmeBar
directly.

This triggers a checkpatch.pl error:

  ERROR: Use of volatile is usually wrong, please add a comment
  #30: FILE: block/nvme.c:691:
  +    volatile NvmeBar *regs;

This is a false positive as in our case we are using I/O registers,
so the 'volatile' use is justified.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200922083821.578519-5-philmd@redhat.com>
2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé 37d7a45abd block/nvme: Reduce I/O registers scope
We only access the I/O register in nvme_init().
Remove the reference in BDRVNVMeState and reduce its scope.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200922083821.578519-4-philmd@redhat.com>
2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé f68453237b block/nvme: Map doorbells pages write-only
Per the datasheet sections 3.1.13/3.1.14:
  "The host should not read the doorbell registers."

As we don't need read access, map the doorbells with write-only
permission. We keep a reference to this mapped address in the
BDRVNVMeState structure.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200922083821.578519-3-philmd@redhat.com>
2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé b02c01a513 util/vfio-helpers: Pass page protections to qemu_vfio_pci_map_bar()
Pages are currently mapped READ/WRITE. To be able to use different
protections, add a new argument to qemu_vfio_pci_map_bar().

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200922083821.578519-2-philmd@redhat.com>
2020-10-05 09:35:52 +01:00
Alberto Garcia c508c73dca qcow2: Use L1E_SIZE in qcow2_write_l1_entry()
We overlooked these in 02b1ecfa10

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200928162333.14998-1-berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf 30dbc81d31 block/export: Move writable to BlockExportOptions
The 'writable' option is a basic option that will probably be applicable
to most if not all export types that we will implement. Move it from NBD
to the generic BlockExport layer.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-26-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf 8cade320c8 block/export: Add query-block-exports
This adds a simple QMP command to query the list of block exports.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-25-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf 331170e073 block/export: Create BlockBackend in blk_exp_add()
Every export type will need a BlockBackend, so creating it centrally in
blk_exp_add() instead of the .create driver callback avoids duplication.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-24-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf 37a4f70cea block/export: Move blk to BlockExport
Every block export has a BlockBackend representing the disk that is
exported. It should live in BlockExport therefore.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-23-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf 1a9f7a804f block/export: Add BLOCK_EXPORT_DELETED event
Clients may want to know when an export has finally disappeard
(block-export-del returns earlier than that in the general case), so add
a QAPI event for it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200924152717.287415-22-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf 3c3bc462ad block/export: Add block-export-del
Implement a new QMP command block-export-del and make nbd-server-remove
a wrapper around it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-21-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf 3859ad36f0 block/export: Move strong user reference to block_exports
The reference owned by the user/monitor that is created when adding the
export and dropped when removing it was tied to the 'exports' list in
nbd/server.c. Every block export will have a user reference, so move it
to the block export level and tie it to the 'block_exports' list in
block/export/export.c instead. This is necessary for introducing a QMP
command for removing exports.

Note that exports are present in block_exports even after the user has
requested shutdown. This is different from NBD's exports where exports
are immediately removed on a shutdown request, even if they are still in
the process of shutting down. In order to avoid that the user still
interacts with an export that is shutting down (and possibly removes it
a second time), we need to remember if the user actually still owns it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-20-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf d53be9ce55 block/export: Add 'id' option to block-export-add
We'll need an id to identify block exports in monitor commands. This
adds one.

Note that this is different from the 'name' option in the NBD server,
which is the externally visible export name. While block export ids need
to be unique in the whole process, export names must be unique only for
the same server. Different export types or (potentially in the future)
multiple NBD servers can have the same export name externally, but still
need different block export ids internally.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-19-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf bc4ee65b8c block/export: Add blk_exp_close_all(_type)
This adds a function to shut down all block exports, and another one to
shut down the block exports of a single type. The latter is used for now
when stopping the NBD server. As soon as we implement support for
multiple NBD servers, we'll need a per-server list of exports and it
will be replaced by a function using that.

As a side effect, the BlockExport layer has a list tracking all existing
exports now. closed_exports loses its only user and can go away.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-18-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf a6ff798966 block/export: Allocate BlockExport in blk_exp_add()
Instead of letting the driver allocate and return the BlockExport
object, allocate it already in blk_exp_add() and pass it. This allows us
to initialise the generic part before calling into the driver so that
the driver can just use these values instead of having to parse the
options a second time.

For symmetry, move freeing the BlockExport to blk_exp_unref().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-17-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf b6076afcab block/export: Add node-name to BlockExportOptions
Every block export needs a block node to export, so add a 'node-name'
option to BlockExportOptions and remove the replaced option 'device'
from BlockExportOptionsNbd.

To maintain compatibility in nbd-server-add, BlockExportOptionsNbd needs
to be wrapped by a new type NbdServerAddOptions that adds 'device' back
because nbd-server-add doesn't use the BlockExportOptions base type at
all (so even without changing it to a 'node-name' option in
block-export-add, this compatibility code would be necessary).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-16-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf 8612c68673 block/export: Move AioContext from NBDExport to BlockExport
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-15-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf c69de1bef5 block/export: Move refcount from NBDExport to BlockExport
Having a refcount makes sense for all types of block exports. It is also
a prerequisite for keeping a list of all exports at the BlockExport
level.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-14-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf 1c8222b014 nbd: Add max-connections to nbd-server-start
This is a QMP equivalent of qemu-nbd's --shared option, limiting the
maximum number of clients that can attach at the same time.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200924152717.287415-9-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf 9b562c646b block/export: Remove magic from block-export-add
nbd-server-add tries to be convenient and adds two questionable
features that we don't want to share in block-export-add, even for NBD
exports:

1. When requesting a writable export of a read-only device, the export
   is silently downgraded to read-only. This should be an error in the
   context of block-export-add.

2. When using a BlockBackend name, unplugging the device from the guest
   will automatically stop the NBD server, too. This may sometimes be
   what you want, but it could also be very surprising. Let's keep
   things explicit with block-export-add. If the user wants to stop the
   export, they should tell us so.

Move these things into the nbd-server-add QMP command handler so that
they apply only there.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-8-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf 56ee86261e block/export: Add BlockExport infrastructure and block-export-add
We want to have a common set of commands for all types of block exports.
Currently, this is only NBD, but we're going to add more types.

This patch adds the basic BlockExport and BlockExportDriver structs and
a QMP command block-export-add that creates a new export based on the
given BlockExportOptions.

qmp_nbd_server_add() becomes a wrapper around qmp_block_export_add().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-5-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf 143ea7670c qapi: Rename BlockExport to BlockExportOptions
The name BlockExport will be used for the struct containing the runtime
state of block exports, so change the name of export creation options.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200924152717.287415-4-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf 5daa6bfd8e qapi: Create block-export module
Move all block export related types and commands from block-core to the
new QAPI module block-export.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200924152717.287415-3-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Philippe Mathieu-Daudé 74f2e02766 block/sheepdog: Replace magic val by NANOSECONDS_PER_SECOND definition
Use self-explicit NANOSECONDS_PER_SECOND definition instead
of magic value.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200921110145.520944-1-philmd@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Philippe Mathieu-Daudé f68c01470b qapi: Restrict query-uuid command to machine code
Only qemu-system-FOO and qemu-storage-daemon provide QMP
monitors, therefore such declarations and definitions are
irrelevant for user-mode emulation.

Restricting the query-uuid command to machine.json pulls less
QAPI-generated code into user-mode.

Acked-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200913195348.1064154-6-philmd@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2020-09-29 15:41:35 +02:00
Stefan Hajnoczi d73415a315 qemu/atomic.h: rename atomic_ to qatomic_
clang's C11 atomic_fetch_*() functions only take a C11 atomic type
pointer argument. QEMU uses direct types (int, etc) and this causes a
compiler error when a QEMU code calls these functions in a source file
that also included <stdatomic.h> via a system header file:

  $ CC=clang CXX=clang++ ./configure ... && make
  ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid)

Avoid using atomic_*() names in QEMU's atomic.h since that namespace is
used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h
and <stdatomic.h> can co-exist. I checked /usr/include on my machine and
searched GitHub for existing "qatomic_" users but there seem to be none.

This patch was generated using:

  $ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \
    sort -u >/tmp/changed_identifiers
  $ for identifier in $(</tmp/changed_identifiers); do
        sed -i "s%\<$identifier\>%q$identifier%g" \
            $(git grep -I -l "\<$identifier\>")
    done

I manually fixed line-wrap issues and misaligned rST tables.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
2020-09-23 16:07:44 +01:00
Daniel P. Berrangé b18a24a9f8 block/file: switch to use qemu_open/qemu_create for improved errors
Currently at startup if using cache=none on a filesystem lacking
O_DIRECT such as tmpfs, at startup QEMU prints

qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: file system may not support O_DIRECT
qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: Could not open '/tmp/foo.img': Invalid argument

while at QMP level the hint is missing, so QEMU reports just

  "error": {
      "class": "GenericError",
      "desc": "Could not open '/tmp/foo.img': Invalid argument"
  }

which is close to useless for the end user trying to figure out what
they did wrong.

With this change at startup QEMU prints

qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: Unable to open '/tmp/foo.img': filesystem does not support O_DIRECT

while at the QMP level QEMU reports a massively more informative

  "error": {
     "class": "GenericError",
     "desc": "Unable to open '/tmp/foo.img': filesystem does not support O_DIRECT"
  }

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-16 10:33:48 +01:00
Daniel P. Berrangé 448058aa99 util: rename qemu_open() to qemu_open_old()
We want to introduce a new version of qemu_open() that uses an Error
object for reporting problems and make this it the preferred interface.
Rename the existing method to release the namespace for the new impl.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-16 10:33:48 +01:00
Stefano Garzarella 7bae7c805d block/rbd: add 'namespace' to qemu_rbd_strong_runtime_opts[]
Commit 19ae9ae014 ("block/rbd: Add support for ceph namespaces")
introduced namespace support for RBD, but we forgot to add the
new 'namespace' options to qemu_rbd_strong_runtime_opts[].

The 'namespace' is used to identify the image, so it is a strong
option since it can changes the data of a BDS.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1821528
Fixes: 19ae9ae014 ("block/rbd: Add support for ceph namespaces")
Cc: Florian Florensa <fflorensa@online.net>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20200914190553.74871-1-sgarzare@redhat.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:31:10 +02:00
Alberto Garcia bfd0989acf qcow2: Convert qcow2_alloc_cluster_offset() into qcow2_alloc_host_offset()
qcow2_alloc_cluster_offset() takes an (unaligned) guest offset and
returns the (aligned) offset of the corresponding cluster in the qcow2
image.

In practice none of the callers need to know where the cluster starts
so this patch makes the function calculate and return the final host
offset directly. The function is also renamed accordingly.

See 388e581615 for a similar change to qcow2_get_cluster_offset().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <9bfef50ec9200d752413be4fc2aeb22a28378817.1599833007.git.berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:31:10 +02:00
Alberto Garcia 8e958260c5 qcow2: Make preallocate_co() resize the image to the correct size
This function preallocates metadata structures and then extends the
image to its new size, but that new size calculation is wrong because
it doesn't take into account that the host_offset variable is always
cluster-aligned.

This problem can be reproduced with preallocation=metadata when the
original size is not cluster-aligned but the new size is. In this case
the final image size will be shorter than expected.

   qemu-img create -f qcow2 img.qcow2 31k
   qemu-img resize --preallocation=metadata img.qcow2 128k

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <adeb8b059917b141d5f5b3bd2a016262d3052c79.1599833007.git.berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Mark compat=0.10 unsupported for iotest 125]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:30:36 +02:00
John Snow c1dadda02c block/qcow: remove runtime opts
Introduced by d85f4222b4,
These were seemingly never used at all.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20200806211345.2925343-3-jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:13 +02:00
John Snow 30b70f070f block/rbd: remove runtime_opts
This saw its last use in 4bfb274165.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20200806211345.2925343-2-jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:13 +02:00
Alberto Garcia 580384d637 qcow2: Return the original error code in qcow2_co_pwrite_zeroes()
This function checks the current status of a (sub)cluster in order to
see if an unaligned 'write zeroes' request can be done efficiently by
simply updating the L2 metadata and without having to write actual
zeroes to disk.

If the situation does not allow using the fast path then the function
returns -ENOTSUP and the caller falls back to writing zeroes.

If can happen however that the aforementioned check returns an actual
error code so in this case we should pass it to the caller.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200909123739.719-1-berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:13 +02:00
Alberto Garcia 3fec237fca qcow2: Make qcow2_free_any_clusters() free only one cluster
This function takes an L2 entry and a number of clusters to free.
Although in principle it can free any type of cluster (using the L2
entry to determine its type) in practice the API is broken because
compressed clusters have a variable size and there is no way to free
more than one without having the L2 entry of each one of them.

The good news all callers are passing nb_clusters=1 so we can simply
get rid of that parameter.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <77cea0f4616f921d37e971b3c5b18a2faa24b173.1599573989.git.berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:13 +02:00
Alberto Garcia 1a52b73dba qcow2: Handle QCowL2Meta on error in preallocate_co()
If qcow2_alloc_cluster_offset() or qcow2_alloc_cluster_link_l2() fail
then this function simply returns the error code, potentially leaking
the QCowL2Meta structure and leaving stale items in s->cluster_allocs.

A second problem is that this function calls qcow2_free_any_clusters()
on failure but passing a host cluster offset instead of an L2 entry.
Luckily for normal uncompressed clusters a raw offset also works like
a valid L2 entry so it works just the same, but we should be using
qcow2_free_clusters() instead.

This patch fixes both problems by using qcow2_handle_l2meta().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <cd3a6b9abd43f9c0b60be413d760f0cacc67eb66.1599573989.git.berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:13 +02:00
Swapnil Ingle 83a6a90009 block/vhdx: Support vhdx image only with 512 bytes logical sector size
block/vhdx uses qemu block layer where sector size is always 512 bytes.
This may have issues  with 4K logical sector sized vhdx image.

For e.g qemu-img convert on such images fails with following assert:

$qemu-img convert -f vhdx -O raw 4KTest1.vhdx test.raw
qemu-img: util/iov.c:388: qiov_slice: Assertion `offset + len <=
qiov->size' failed.
Aborted

This patch adds an check to return ENOTSUP for vhdx images which
have logical sector size other than 512 bytes.

Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Message-Id: <1596794594-44531-1-git-send-email-swapnil.ingle@nutanix.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:13 +02:00
Alberto Garcia 2b60c5b996 qcow2: Rewrite the documentation of qcow2_alloc_cluster_offset()
The current text corresponds to an earlier, simpler version of this
function and it does not explain how it works now.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <bb5bd06f07c5a05b0818611de0d06ec5b66c8df3.1599150873.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:13 +02:00
Alberto Garcia f7bd5bba1b qcow2: Don't check nb_clusters when removing l2meta from the list
In the past, when a new cluster was allocated the l2meta structure was
a variable in the stack so it was necessary to have a way to tell
whether it had been initialized and contained valid data or not. The
nb_clusters field was used for this purpose. Since commit f50f88b9fe
this is no longer the case, l2meta (nowadays a pointer to a list) is
only allocated when needed and nb_clusters is guaranteed to be > 0 so
this check is unnecessary.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <ab0b67c29c7ba26e598db35f12aa5ab5982539c1.1599150873.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:13 +02:00
Alberto Garcia 184581fa4d qcow2: Fix removal of list members from BDRVQcow2State.cluster_allocs
When a write request needs to allocate new clusters (or change the L2
bitmap of existing ones) a QCowL2Meta structure is created so the L2
metadata can be later updated and any copy-on-write can be performed
if necessary.

A write request can span a region consisting of an arbitrary
combination of previously unallocated and allocated clusters, and if
the unallocated ones can be put contiguous to the existing ones then
QEMU will do so in order to minimize the number of write operations.

In practice this means that a write request has not just one but a
number of QCowL2Meta structures. All of them are added to the
cluster_allocs list that is stored in BDRVQcow2State and is used to
detect overlapping requests. After the write request finishes all its
associated QCowL2Meta are removed from that list. calculate_l2_meta()
takes care of creating and putting those structures in the list, and
qcow2_handle_l2meta() takes care of removing them.

The problem is that the error path in handle_alloc() also tries to
remove an item in that list, a remnant from the time when this was
handled there (that code would not even be correct anymore because
it only removes one struct and not all the ones from the same write
request).

This can trigger a double removal of the same item from the list,
causing a crash. This is not easy to reproduce in practice because
it requires that do_alloc_cluster_offset() fails after a successful
previous allocation during the same write request, but it can be
reproduced with the included test case.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <3440a1c4d53c4fe48312b478c96accb338cbef7c.1599150873.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:13 +02:00
Alberto Garcia 02b1ecfa10 qcow2: Use macros for the L1, refcount and bitmap table entry sizes
This patch replaces instances of sizeof(uint64_t) in the qcow2 driver
with macros that indicate what those sizes are actually referring to.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200828110828.13833-1-berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:12 +02:00
Lukas Straub 5eb9a3c7b0 block/quorum.c: stable children names
If we remove the child with the highest index from the quorum,
decrement s->next_child_index. This way we get stable children
names as long as we only remove the last child.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Fixes: https://bugs.launchpad.net/bugs/1881231
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <5d5f930424c1c770754041aa8ad6421dc4e2b58e.1596536719.git.lukasstraub2@web.de>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:12 +02:00
Peter Maydell 2499453eb1 Block layer patches:
- qemu-img create: Fail gracefully when backing file is an empty string
 - Fixes related to filter block nodes ("Deal with filters" series)
 - block/nvme: Various cleanups required to use multiple queues
 - block/nvme: Use NvmeBar structure from "block/nvme.h"
 - file-win32: Fix "locking" option
 - iotests: Allow running from different directory
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl9Z7bcRHGt3b2xmQHJl
 ZGhhdC5jb20ACgkQfwmycsiPL9ZS6w//bos+A0RfRRF0YFWkIBLQWxqzKcGvMJ8W
 XWv3mFzd47UaDgRYwVnCC3CR6bLYEINISngZ3geA4jI1+w7AtYKDOO0HN32dUg+D
 ZrNMn02701CA6qkmpxJ+yjsrl9ltR3jYe0me4Wr39Pvdexa2pl/e+M4Vas6FhkYL
 ghAwNThypscGCrFjAlz3ru2Sc/K+sPWrGoqkzr+SWvsm9wy4vb8aLxr8Yy50x/zc
 CqALS9SQ/YA93BCVi9CzPkVyV3ioA0kg/y38WvLtAQ9GZ3m/ekMro3WvdYsRsFCN
 LGXsuwFig+U7Kd7lJrCS9TLnlTJstNGqPq9jEoV5cThPvGknFfMvVOzRmmP7tzqT
 YRcPRy39z44OoLKa3kyg3aF38BTxt+9gPqBnivKMr9j9EecMvPsXXHRvF+lP+LsP
 j753Ih561hX6FurcjX8pc9GOM2cQA0GjlyL77UTTAmLZyFXP/8e55oQbBuYTylc/
 Xlvmc/T+yEGiEGTnK+FxgDAiUaxbCCM9cDVStJjTvsIq43dwXb48g1onDsGZ5eDf
 j9lmAD6TJxHNOB5ErNsDPODf4/D1wJ9t9WVF8UZp9ArfPHRdxMzT7Q4LvetaDmVl
 +hQC9cgTq8Qd8LwSqbKEYua4L6iGbmLAT7/N6htq5L1eVLg76/tLg/tKSwh/vKAY
 yzPmyHaVK84=
 =gaaW
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- qemu-img create: Fail gracefully when backing file is an empty string
- Fixes related to filter block nodes ("Deal with filters" series)
- block/nvme: Various cleanups required to use multiple queues
- block/nvme: Use NvmeBar structure from "block/nvme.h"
- file-win32: Fix "locking" option
- iotests: Allow running from different directory

# gpg: Signature made Thu 10 Sep 2020 10:11:19 BST
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (65 commits)
  block/qcow2-cluster: Add missing "fallthrough" annotation
  block/nvme: Pair doorbell registers
  block/nvme: Use generic NvmeBar structure
  block/nvme: Group controller registers in NVMeRegs structure
  file-win32: Fix "locking" option
  iotests: Allow running from different directory
  iotests: Test committing to overridden backing
  iotests: Add test for commit in sub directory
  iotests: Add filter mirror test cases
  iotests: Add filter commit test cases
  iotests: Let complete_and_wait() work with commit
  iotests: Test that qcow2's data-file is flushed
  block: Leave BDS.backing_{file,format} constant
  block: Inline bdrv_co_block_status_from_*()
  blockdev: Fix active commit choice
  block: Drop backing_bs()
  qemu-img: Use child access functions
  nbd: Use CAF when looking for dirty bitmap
  commit: Deal with filters
  backup: Deal with filters
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-09-11 14:47:49 +01:00
Thomas Huth b9be6faed1 block/qcow2-cluster: Add missing "fallthrough" annotation
When compiling with -Werror=implicit-fallthrough, the compiler currently
complains:

../../devel/qemu/block/qcow2-cluster.c: In function ‘cluster_needs_new_alloc’:
../../devel/qemu/block/qcow2-cluster.c:1320:12: error: this statement may fall
 through [-Werror=implicit-fallthrough=]
         if (l2_entry & QCOW_OFLAG_COPIED) {
            ^
../../devel/qemu/block/qcow2-cluster.c:1323:5: note: here
     case QCOW2_CLUSTER_UNALLOCATED:
     ^~~~

It's quite obvious that the fallthrough is intended here, so let's add
a comment to silence the compiler warning.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20200908070028.193298-1-thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé e5ff22ba9f block/nvme: Pair doorbell registers
For each queue doorbell registers are paired as:
- Submission Queue Tail Doorbell
- Completion Queue Head Doorbell

Reflect that in the NVMeRegs structure, and adapt
nvme_create_queue_pair() accordingly.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200904124130.583838-4-philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Fam Zheng <fam@euphon.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé c7100f0a0b block/nvme: Use generic NvmeBar structure
Commit f3c507adcd ("NVMe: Initial commit for new storage interface")
introduced the NvmeBar structure. Unfortunately in commit bdd6a90a9e
("block: Add VFIO based NVMe driver") we duplicated it.

Apparently in commit a3d9a352d4 ("block: Move NVMe constants to
a separate header") we tried to unify headers but forgot to remove
the structure declared in the block/nvme.c source file.

Do it now, and remove the structure size check which is redundant
with the header check added in commit 74e18435c0 ("hw/block/nvme:
Align I/O BAR to 4 KiB").

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200904124130.583838-3-philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Fam Zheng <fam@euphon.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé 0ea32f34ce block/nvme: Group controller registers in NVMeRegs structure
We want to use the NvmeBar structure from "block/nvme.h" in the
next commit. As a preliminary step, group all the NVMe controller
registers in the 'ctrl' field, keeping the doorbells registers
out of it.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200904124130.583838-2-philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Fam Zheng <fam@euphon.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-10 11:11:12 +02:00
Kevin Wolf 3b079ac0ff file-win32: Fix "locking" option
The intended behaviour was that locking=off/auto work and have no
effect (to remain compatible with file-posix), whereas locking=on would
return an error. Unfortunately, the code forgot to remove "locking" from
the options QDict, so any attempt to use the option would fail.

Replace the option parsing code for "locking" with something that is
part of the raw_runtime_opts QemuOptsList (so it is properly removed
from the QDict) and looks more like file-posix.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200907092739.9988-1-kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-10 11:11:12 +02:00
Markus Armbruster b15e402fc8 trace-events: Fix attribution of trace points to source
Some trace points are attributed to the wrong source file.  Happens
when we neglect to update trace-events for code motion, or add events
in the wrong place, or misspell the file name.

Clean up with help of scripts/cleanup-trace-events.pl.  Funnies
requiring manual post-processing:

* accel/tcg/cputlb.c trace points are in trace-events.

* block.c and blockdev.c trace points are in block/trace-events.

* hw/block/nvme.c uses the preprocessor to hide its trace point use
  from cleanup-trace-events.pl.

* hw/tpm/tpm_spapr.c uses pseudo trace point tpm_spapr_show_buffer to
  guard debug code.

* include/hw/xen/xen_common.h trace points are in hw/xen/trace-events.

* linux-user/trace-events abbreviates a tedious list of filenames to
  */signal.c.

* net/colo-compare and net/filter-rewriter.c use pseudo trace points
  colo_compare_miscompare and colo_filter_rewriter_debug to guard
  debug code.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200806141334.3646302-5-armbru@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-09-09 17:17:58 +01:00
Markus Armbruster 6ec9379870 trace-events: Delete unused trace points
Tracked down with the help of scripts/cleanup-trace-events.pl.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20200806141334.3646302-4-armbru@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-09-09 17:17:02 +01:00
Max Reitz 0b877d09df block: Leave BDS.backing_{file,format} constant
Parts of the block layer treat BDS.backing_file as if it were whatever
the image header says (i.e., if it is a relative path, it is relative to
the overlay), other parts treat it like a cache for
bs->backing->bs->filename (relative paths are relative to the CWD).
Considering bs->backing->bs->filename exists, let us make it mean the
former.

Among other things, this now allows the user to specify a base when
using qemu-img to commit an image file in a directory that is not the
CWD (assuming, everything uses relative filenames).

Before this patch:

$ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
$ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
$ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'

After this patch:

$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
Image committed.
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
Image committed.

With this change, bdrv_find_backing_image() must look at whether the
user has overridden a BDS's backing file.  If so, it can no longer use
bs->backing_file, but must instead compare the given filename against
the backing node's filename directly.

Note that this changes the QAPI output for a node's backing_file.  We
had very inconsistent output there (sometimes what the image header
said, sometimes the actual filename of the backing image).  This
inconsistent output was effectively useless, so we have to decide one
way or the other.  Considering that bs->backing_file usually at runtime
contained the path to the image relative to qemu's CWD (or absolute),
this patch changes QAPI's backing_file to always report the
bs->backing->bs->filename from now on.  If you want to receive the image
header information, you have to refer to full-backing-filename.

This necessitates a change to iotest 228.  The interesting information
it really wanted is the image header, and it can get that now, but it
has to use full-backing-filename instead of backing_file.  Because of
this patch's changes to bs->backing_file's behavior, we also need some
reference output changes.

Along with the changes to bs->backing_file, stop updating
BDS.backing_format in bdrv_backing_attach() as well.  This way,
ImageInfo's backing-filename and backing-filename-format fields will
represent what the image header says and nothing else.

iotest 245 changes in behavior: With the backing node no longer
overriding the parent node's backing_file string, you can now omit the
@backing option when reopening a node with neither a default nor a
current backing file even if it used to have a backing node at some
point.

273 also changes: The base image is opened without a format layer, so
ImageInfo.backing-filename-format used to report "file" for the base
image's overlay after blockdev-snapshot.  However, the image header
never says "file" anywhere, so it now reports $IMGFMT.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz 549ec0d978 block: Inline bdrv_co_block_status_from_*()
With bdrv_filter_bs(), we can easily handle this default filter behavior
in bdrv_co_block_status().

blkdebug wants to have an additional assertion, so it keeps its own
implementation, except bdrv_co_block_status_from_file() needs to be
inlined there.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz 9a71b9de3f commit: Deal with filters
This includes some permission limiting (for example, we only need to
take the RESIZE permission if the base is smaller than the top).

Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz 2b088c60bb backup: Deal with filters
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz 3f072a7fb7 mirror: Deal with filters
This includes some permission limiting (for example, we only need to
take the RESIZE permission for active commits where the base is smaller
than the top).

base_overlay is introduced so we can query bdrv_is_allocated_above() on
it - we cannot do that with base itself, because a filter's block_status
is the same as its child node, so if there are filters on base,
bdrv_is_allocated_above() on base would return information including
base.

Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to
"target_backing_bs", because that is what it really refers to.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz c6f6d8462c block-copy: Use CAF to find sync=top base
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz 0a7585dbba block: Use child access functions for QAPI queries
query-block, query-named-block-nodes, and query-blockstats now return
any filtered child under "backing", not just bs->backing or COW
children.  This is so that filters do not interrupt the reported backing
chain.  This changes the output for iotest 184, as the throttled node
now appears as a backing child.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz 3f26191c73 block: Report data child for query-blockstats
It makes no sense to report the block stats of a purely metadata-storing
child in query-blockstats.  So if the primary child does not have any
data, try to find a unique data-storing child.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz 07cd7b659a block/null: Implement bdrv_get_allocated_file_size
It is trivial, so we might as well do it.

Remove _filter_actual_image_size from iotest 184, so we get to see the
result in its reference output.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz c8af87573f block/snapshot: Fix fallback
If the top node's driver does not provide snapshot functionality and we
want to fall back to a node down the chain, we need to snapshot all
non-COW children.  For simplicity's sake, just do not fall back if there
is more than one such child.  Furthermore, we really only can fall back
to bs->file and bs->backing, because bdrv_snapshot_goto() has to modify
the child link (notably, set it to NULL).

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz c4db2e25df block: Use CAF in bdrv_co_rw_vmstate()
If a node whose driver does not provide VM state functions has a
metadata child, the VM state should probably go there; if it is a
filter, the VM state should probably go there.  It follows that we
should generally go down to the primary child.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz 66b129ac5e block: Iterate over children in refresh_limits
Instead of looking at just bs->file and bs->backing, we should look at
all children that could end up receiving forwarded requests.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz fb787f02a6 vmdk: Drop vmdk_co_flush()
Before HEAD^, we needed this because bdrv_co_flush() by itself would
only flush bs->file.  With HEAD^, bdrv_co_flush() will flush all
children on which a WRITE or WRITE_UNCHANGED permission has been taken.
Thus, vmdk no longer needs to do it itself.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz 883833e29c block: Flush all children in generic code
If the driver does not support .bdrv_co_flush() so bdrv_co_flush()
itself has to flush the children of the given node, it should not flush
just bs->file->bs, but in fact all children that might have been written
to (judging from the permissions taken on them).

This is a bug fix for qcow2 images with an external data file, as they
so far did not flush that data_file node.

In any case, the BLKDBG_EVENT() should be emitted on the primary child,
because that is where a blkdebug node would be if there is any.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz 23b93525a2 block: Use bdrv_cow_child() in bdrv_co_truncate()
The condition modified here is not about potentially filtered children,
but only about COW sources (i.e. traditional backing files).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz 67acfd2188 stream: Deal with filters
Because of the (not so recent anymore) changes that make the stream job
independent of the base node and instead track the node above it, we
have to split that "bottom" node into two cases: The bottom COW node,
and the node directly above the base node (which may be an R/W filter
or the bottom COW node).

Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz cb8503159a block: Use CAFs in block status functions
Use the child access functions in the block status inquiry functions as
appropriate.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz 93393e698c block: Use bdrv_filter_(bs|child) where obvious
Places that use patterns like

    if (bs->drv->is_filter && bs->file) {
        ... something about bs->file->bs ...
    }

should be

    BlockDriverState *filtered = bdrv_filter_bs(bs);
    if (filtered) {
        ... something about @filtered ...
    }

instead.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz 4935e8be22 copy-on-read: Support compressed writes
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz e7e754aec3 throttle: Support compressed writes
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz 8b8277cdb0 block: Drop bdrv_is_encrypted()
The original purpose of bdrv_is_encrypted() was to inquire whether a BDS
can be used without the user entering a password or not.  It has not
been used for that purpose for quite some time.

Actually, it is not even fit for that purpose, because to answer that
question, it would have recursively query all of the given node's
children.

So now we have to decide in which direction we want to fix
bdrv_is_encrypted(): Recursively query all children, or drop it and just
use bs->encrypted to get the current node's status?

Nowadays, its only purpose is to report through bdrv_query_image_info()
whether the given image is encrypted or not.  For this purpose, it is
probably more interesting to see whether a given node itself is
encrypted or not (otherwise, a management application cannot discern for
certain which nodes are really encrypted and which just have encrypted
children).

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé b111b3fcde block/nvme: Use an array of EventNotifier
In preparation of using multiple IRQ (thus multiple eventfds)
make BDRVNVMeState::irq_notifier an array (for now of a single
element, the admin queue notifier).

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-16-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé 7a1fb2ef40 block/nvme: Extract nvme_poll_queue()
As we want to do per-queue polling, extract the nvme_poll_queue()
method which operates on a single queue.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-15-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé 0a28b02ef9 block/nvme: Simplify nvme_create_queue_pair() arguments
nvme_create_queue_pair() doesn't require BlockDriverState anymore.
Replace it by BDRVNVMeState and AioContext to simplify.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-14-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé 073a06978c block/nvme: Replace BDRV_POLL_WHILE by AIO_WAIT_WHILE
BDRV_POLL_WHILE() is defined as:

  #define BDRV_POLL_WHILE(bs, cond) ({          \
      BlockDriverState *bs_ = (bs);             \
      AIO_WAIT_WHILE(bdrv_get_aio_context(bs_), \
                     cond); })

As we will remove the BlockDriverState use in the next commit,
start by using the exploded version of BDRV_POLL_WHILE().

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-13-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé 3a6d34d066 block/nvme: Simplify nvme_init_queue() arguments
nvme_init_queue() doesn't require BlockDriverState anymore.
Replace it by BDRVNVMeState to simplify.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-12-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé 38e1f8186f block/nvme: Replace qemu_try_blockalign(bs) by qemu_try_memalign(pg_sz)
qemu_try_blockalign() is a generic API that call back to the
block driver to return its page alignment. As we call from
within the very same driver, we already know to page alignment
stored in our state. Remove indirections and use the value from
BDRVNVMeState.
This change is required to later remove the BlockDriverState
argument, to make nvme_init_queue() per hardware, and not per
block driver.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-11-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé 2ed846930d block/nvme: Replace qemu_try_blockalign0 by qemu_try_blockalign/memset
In the next commit we'll get rid of qemu_try_blockalign().
To ease review, first replace qemu_try_blockalign0() by explicit
calls to qemu_try_blockalign() and memset().

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-10-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé 7d3b214ae4 block/nvme: Use union of NvmeIdCtrl / NvmeIdNs structures
We allocate an unique chunk of memory then use it for two
different structures. By using an union, we make it clear
the data is overlapping (and we can remove the casts).

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-9-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:24:53 +02:00
Philippe Mathieu-Daudé 4d98093937 block/nvme: Rename local variable
We are going to modify the code in the next commit. Renaming
the 'resp' variable to 'id' first makes the next commit easier
to review. No logical changes.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-8-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé c8edbfb2cc block/nvme: Use common error path in nvme_add_io_queue()
Rearrange nvme_add_io_queue() by using a common error path.
This will be proven useful in few commits where we add IRQ
notification to the IO queues.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-7-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé bf6ce5ec6d block/nvme: Improve error message when IO queue creation failed
Do not use the same error message for different failures.
Display a different error whether it is the CQ or the SQ.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-6-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé 73159e52e6 block/nvme: Define INDEX macros to ease code review
Use definitions instead of '0' or '1' indexes. Also this will
be useful when using multi-queues later.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-5-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé 0ea45f76eb block/nvme: Let nvme_create_queue_pair() fail gracefully
As nvme_create_queue_pair() is allowed to fail, replace the
alloc() calls by try_alloc() to avoid aborting QEMU.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-4-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé e266f52cfb block/nvme: Avoid further processing if trace event not enabled
Avoid further processing if TRACE_NVME_SUBMIT_COMMAND_RAW is
not enabled. This is an untested intend of performance optimization.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-3-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé e4f310fe7f block/nvme: Replace magic value by SCALE_MS definition
Use self-explicit SCALE_MS definition instead of magic value.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-2-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:23:55 +02:00
Peter Maydell df8176274a nbd patches for 2020-09-02
- fix a few iotests affected by earlier nbd changes
 - avoid blocking qemu by nbd client in connect()
 - build qemu-nbd for mingw
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl9QFB8ACgkQp6FrSiUn
 Q2pTwggArPN7dRwm1KD9jL8X6PV01uhLuLRFzrrofFX22Blroj5XPbR24BdKeN8V
 uDSFGzzoz2Vx3vKPHyZdx7bbeEL0pF9dzjZX6JwzT0McbVuge5aG2zC/ARdfc4PN
 1Yf2FB/nY8Xt5G12usu3FIz7JBoQNm4mlPnqVqf7t0LQxgUFvO7F2LernyqEOYKS
 uSpXHNFqddZcax7etXeldJOlSGJBQTaCmplrJbw2ilVhLZJD+0OglY4SAsrrenN+
 gb/KD4REjhtQsmoTaqthGCnGXipEoJYDfEOMhbkl0UK+9Mx0t+3cj3tflG0eGldM
 ERk1d4d7VSlyqIy7w43v3+IB8M99ag==
 =7Cn5
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2020-09-02' into staging

nbd patches for 2020-09-02

- fix a few iotests affected by earlier nbd changes
- avoid blocking qemu by nbd client in connect()
- build qemu-nbd for mingw

# gpg: Signature made Wed 02 Sep 2020 22:52:31 BST
# gpg:                using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg:                 aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2020-09-02:
  nbd: disable signals and forking on Windows builds
  nbd: skip SIGTERM handler if NBD device support is not built
  block: add missing socket_init() calls to tools
  block/nbd: use non-blocking connect: fix vm hang on connect()
  iotests/259: Fix reference output
  iotests/059: Fix reference output

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-09-03 21:35:01 +01:00
Vladimir Sementsov-Ogievskiy 1dc4718d84 block/nbd: use non-blocking connect: fix vm hang on connect()
This makes nbd's connection_co yield during reconnects, so that
reconnect doesn't block the main thread. This is very important in
case of an unavailable nbd server host: connect() call may take a long
time, blocking the main thread (and due to reconnect, it will hang
again and again with small gaps of working time during pauses between
connection attempts).

Realization notes:

 - We don't want to implement non-blocking connect() over non-blocking
 socket, because getaddrinfo() doesn't have portable non-blocking
 realization anyway, so let's just use a thread for both getaddrinfo()
 and connect().

 - We can't use qio_channel_socket_connect_async (which behaves
 similarly and starts a thread to execute connect() call), as it's relying
 on someone iterating main loop (g_main_loop_run() or something like
 this), which is not always the case.

 - We can't use thread_pool_submit_co API, as thread pool waits for all
 threads to finish (but we don't want to wait for blocking reconnect
 attempt on shutdown.

 So, we just create the thread by hand. Some additional difficulties
 are:

 - We want our connect to avoid blocking drained sections and aio context
 switches. To achieve this, we make it possible to "cancel" synchronous
 wait for the connect (which is a coroutine yield actually), still,
 the thread continues in background, and if successful, its result may be
 reused on next reconnect attempt.

 - We don't want to wait for reconnect on shutdown, so there is
 CONNECT_THREAD_RUNNING_DETACHED thread state, which means that the block
 layer is no longer interested in a result, and thread should close new
 connected socket on finish and free the state.

How to reproduce the bug, fixed with this commit:

1. Create an image on node1:
   qemu-img create -f qcow2 xx 100M

2. Start NBD server on node1:
   qemu-nbd xx

3. Start vm with second nbd disk on node2, like this:

  ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \
    file=/work/images/cent7.qcow2 -drive file=nbd+tcp://192.168.100.2 \
    -vnc :0 -qmp stdio -m 2G -enable-kvm -vga std

4. Access the vm through vnc (or some other way?), and check that NBD
   drive works:

   dd if=/dev/sdb of=/dev/null bs=1M count=10

   - the command should succeed.

5. Now, let's trigger nbd-reconnect loop in Qemu process. For this:

5.1 Kill NBD server on node1

5.2 run "dd if=/dev/sdb of=/dev/null bs=1M count=10" in the guest
    again. The command should fail and a lot of error messages about
    failing disk may appear as well.

    Now NBD client driver in Qemu tries to reconnect.
    Still, VM works well.

6. Make node1 unavailable on NBD port, so connect() from node2 will
   last for a long time:

   On node1 (Note, that 10809 is just a default NBD port):

   sudo iptables -A INPUT -p tcp --dport 10809 -j DROP

   After some time the guest hangs, and you may check in gdb that Qemu
   hangs in connect() call, issued from the main thread. This is the
   BUG.

7. Don't forget to drop iptables rule from your node1:

   sudo iptables -D INPUT -p tcp --dport 10809 -j DROP

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200812145237.4396-1-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: minor wording and formatting tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-09-02 16:47:23 -05:00
Peter Maydell e4d8b7c1a9 qemu-nvme
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE28EdLTc7SjdV9QLsYlFWYQpPbMAFAl9Pro4ACgkQYlFWYQpP
 bMC1CRAAkawTI4mMcOfI3smFoMeiY8kZJWXJUBXfHbMJ4asaoIjTkH/lXRXBw7KQ
 sH5tB9CuOums3VjagkZ0Sw6R/kP1LbyJTAwq/pwOXwYRDc/E3zQpMblkIHH1boIM
 Bxl814hw3hBqV+D0wgeKpl83lbiOpd10Cbpdb/xNKat6qVquLGurSGKgA7jNuF4s
 oPTPtfZpyH9LUr4DV+sL+fGX6vaCdSFZPZUhJqwFfx79+r3+YiHGLAE6fgsdGDJt
 2RSSKMqBe2gg0BY5ToW9L55BsLnwMMrAZnGzEkeZvRKqm0JZBXQsERa61p4VEAJf
 uYkSEqOwsKjXQNTdDEekyH67AkgXaoqG0hiiOcgoLsla7C0zROtoKcfVM/+WC0LT
 T0/bfgubmoDV8kLzPuOV8xOGxjfbu4Qlxy1JsIC6BU4zBQvpDwOeTx3MUWaCUfvk
 YmDMEhZWGcZ3RBLrgQmzm4ZKMtGdYXnGQz5dwVkRRfghQs2fl5ZmUjGR7MqKe18n
 4K0nzhPiXbOTlqvLVvzVlrBzdc8ECAs1kVoJF7C3LwRmXbT2N/fUhZP/nYpeM2Hj
 DQNmA8KpXMKae2+2iDnQNWbvdpz3SiHD6dK7A1bEsdoG0L60xfyeAF+JuPiESUnd
 OAhf+muxKiInv2k5GNh7mDZPWM6nDepf/PZP6ohc7dKxVam7N2M=
 =Y23H
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/nvme/tags/pull-nvme-20200902' into staging

qemu-nvme

# gpg: Signature made Wed 02 Sep 2020 15:39:10 BST
# gpg:                using RSA key DBC11D2D373B4A3755F502EC625156610A4F6CC0
# gpg: Good signature from "Keith Busch <kbusch@kernel.org>" [unknown]
# gpg:                 aka "Keith Busch <keith.busch@gmail.com>" [unknown]
# gpg:                 aka "Keith Busch <keith.busch@intel.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: DBC1 1D2D 373B 4A37 55F5  02EC 6251 5661 0A4F 6CC0

* remotes/nvme/tags/pull-nvme-20200902: (39 commits)
  hw/block/nvme: remove explicit qsg/iov parameters
  hw/block/nvme: use preallocated qsg/iov in nvme_dma_prp
  hw/block/nvme: consolidate qsg/iov clearing
  hw/block/nvme: add ns/cmd references in NvmeRequest
  hw/block/nvme: be consistent about zeros vs zeroes
  hw/block/nvme: add check for mdts
  hw/block/nvme: refactor request bounds checking
  hw/block/nvme: verify validity of prp lists in the cmb
  hw/block/nvme: add request mapping helper
  hw/block/nvme: add tracing to nvme_map_prp
  hw/block/nvme: refactor dma read/write
  hw/block/nvme: destroy request iov before reuse
  hw/block/nvme: remove redundant has_sg member
  hw/block/nvme: replace dma_acct with blk_acct equivalent
  hw/block/nvme: add mapping helpers
  hw/block/nvme: memset preallocated requests structures
  hw/block/nvme: bump supported version to v1.3
  hw/block/nvme: provide the mandatory subnqn field
  hw/block/nvme: enforce valid queue creation sequence
  hw/block/nvme: reject invalid nsid values in active namespace id list
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-09-02 21:20:20 +01:00
Klaus Jensen 69265150aa hw/block/nvme: be consistent about zeros vs zeroes
The NVM Express specification generally uses 'zeroes' and not 'zeros',
so let us align with it.

Cc: Fam Zheng <fam@euphon.net>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
2020-09-02 08:48:50 +02:00
Klaus Jensen c26f217370 hw/block/nvme: bump spec data structures to v1.3
Add missing fields in the Identify Controller and Identify Namespace
data structures to bring them in line with NVMe v1.3.

This also adds data structures and defines for SGL support which
requires a couple of trivial changes to the nvme block driver as well.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Fam Zheng <fam@euphon.net>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Message-Id: <20200706061303.246057-2-its@irrelevant.dk>
2020-09-02 08:48:50 +02:00
Peter Maydell 887adde81d meson fixes:
* bump submodule to 0.55.1
 * SDL, pixman and zlib fixes
 * firmwarepath fix
 * fix firmware builds
 
 meson related:
 * move install to Meson
 * move NSIS to Meson
 * do not make meson use cmake
 * add description to options
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl9OcpcUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroM9+Af+InfjEFtsoubgQA2L7B1sHeksINOI
 nMAw9plmJSX0Qabp0PJDrMcBiLZlbdFGCm/88heTyDGRtbhIXLUrE2J0dyV4R4nR
 OSyuTgDna75QLxy6k1dIh6qVAtcj2hg+CxaJrUf2Ix6A1d1PAfWoweNXSF1LBmJf
 pyQ39eZXStI7/bkmwgTY3qK1gwjEaskvf68fTp1hgTN0VHwUb23/nucPaizbVF+a
 A0nWprPQaTjWhAFQ2jJesBbhN6FtD3EnOZs56JOSQ9J7W6uDnw8a+tOdEDbSYKe2
 DdEUEjcz2cqAMCiWrzOMgxB8T885H+8yE6jgEmPHWaUUORQADMcBDMihNA==
 =NGLI
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging

meson fixes:
* bump submodule to 0.55.1
* SDL, pixman and zlib fixes
* firmwarepath fix
* fix firmware builds

meson related:
* move install to Meson
* move NSIS to Meson
* do not make meson use cmake
* add description to options

# gpg: Signature made Tue 01 Sep 2020 17:11:03 BST
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini-gitlab/tags/for-upstream: (26 commits)
  Makefile: Fix in-tree clean/distclean
  Makefile: Add back TAGS/ctags/cscope rules
  meson: add description to options
  build: fix recurse-all target
  meson: use pkg-config method to find dependencies
  configure: do not include ${prefix} in firmwarepath
  meson: add pixman dependency to UI modules
  meson: add pixman dependency to chardev/baum module
  meson: add NSIS building
  meson: use meson mandir instead of qemu_mandir
  meson: pass docdir option
  meson: use meson datadir instead of qemu_datadir
  meson: pass qemu_suffix option
  configure: build docdir like other suffixed directories
  configure: always /-seperate directory from qemu_suffix
  configure: rename confsuffix option
  meson: move zlib detection to meson
  build-sys: remove install target from Makefile
  meson: install $localstatedir/run for qga
  meson: install desktop file
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-09-01 22:50:23 +01:00
Liao Pingfang f181ab4ba5 block/vmdk: Remove superfluous breaks
Remove superfluous breaks, as there is a "return" before them.

Signed-off-by: Liao Pingfang <liao.pingfang@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1594631107-36574-1-git-send-email-wang.yi59@zte.com.cn>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-09-01 08:37:28 +02:00
Paolo Bonzini a10c8516ed block: always link with zlib
The qcow2 driver needs the zlib dependency.  While emulators
provided it through the migration code, this is not true of
the tools.  Move the dependency from the qcow1 rule directly
into block_ss so that it is included unconditionally.

Fixes build with --disable-qcow1.

Reported-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Cc: qemu-block@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-01 01:51:51 -04:00
Eduardo Habkost 7c9dcd6cab throttle-groups: Move ThrottleGroup typedef to header
Move typedef closer to the type check macros, to make it easier
to convert the code to OBJECT_DEFINE_TYPE() in the future.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Tested-By: Roman Bolshakov <r.bolshakov@yadro.com>
Message-Id: <20200825192110.3528606-17-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-08-27 14:04:54 -04:00
Alberto Garcia 7bbb59202a qcow2: Assert that expand_zero_clusters_in_l1() does not support subclusters
This function is only used by qcow2_expand_zero_clusters() to
downgrade a qcow2 image to a previous version. This would require
transforming all extended L2 entries into normal L2 entries but this
is not a simple task and there are no plans to implement this at the
moment.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <15e65112b4144381b4d8c0bdf8fb76b0d813e3d1.1594396418.git.berto@igalia.com>
[mreitz: Fixed comment style]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 10:20:15 +02:00
Alberto Garcia 2118771ddf qcow2: Allow preallocation and backing files if extended_l2 is set
Traditional qcow2 images don't allow preallocation if a backing file
is set. This is because once a cluster is allocated there is no way to
tell that its data should be read from the backing file.

Extended L2 entries have individual allocation bits for each
subcluster, and therefore it is perfectly possible to have an
allocated cluster with all its subclusters unallocated.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <6d5b0f38e7dc5f2f31d8cab1cb92044e9909aece.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 09:20:04 +02:00
Alberto Garcia 7be2025258 qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
Now that the implementation of subclusters is complete we can finally
add the necessary options to create and read images with this feature,
which we call "extended L2 entries".

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <6476caaa73216bd05b7bb2d504a20415e1665176.1594396418.git.berto@igalia.com>
[mreitz: %s/5\.1/5.2/; fixed 302's and 303's reference output]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 09:19:55 +02:00
Alberto Garcia 40dee94320 qcow2: Add prealloc field to QCowL2Meta
This field allows us to indicate that the L2 metadata update does not
come from a write request with actual data but from a preallocation
request.

For traditional images this does not make any difference, but for
images with extended L2 entries this means that the clusters are
allocated normally in the L2 table but individual subclusters are
marked as unallocated.

This will allow preallocating images that have a backing file.

There is one special case: when we resize an existing image we can
also request that the new clusters are preallocated. If the image
already had a backing file then we have to hide any possible stale
data and zero out the new clusters (see commit 955c7d6687 for more
details).

In this case the subclusters cannot be left as unallocated so the L2
bitmap must be updated.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <960d4c444a4f5a870e2b47e5da322a73cd9a2f5a.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia 0dd07b298f qcow2: Add subcluster support to qcow2_measure()
Extended L2 entries are bigger than normal L2 entries so this has an
impact on the amount of metadata needed for a qcow2 file.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <7efae2efd5e36b42d2570743a12576d68ce53685.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia a6841a2de6 qcow2: Add subcluster support to qcow2_co_pwrite_zeroes()
This works now at the subcluster level and pwrite_zeroes_alignment is
updated accordingly.

qcow2_cluster_zeroize() is turned into qcow2_subcluster_zeroize() with
the following changes:

   - The request can now be subcluster-aligned.

   - The cluster-aligned body of the request is still zeroized using
     zero_in_l2_slice() as before.

   - The subcluster-aligned head and tail of the request are zeroized
     with the new zero_l2_subclusters() function.

There is just one thing to take into account for a possible future
improvement: compressed clusters cannot be partially zeroized so
zero_l2_subclusters() on the head or the tail can return -ENOTSUP.
This makes the caller repeat the *complete* request and write actual
zeroes to disk. This is sub-optimal because

   1) if the head area was compressed we would still be able to use
      the fast path for the body and possibly the tail.

   2) if the tail area was compressed we are writing zeroes to the
      head and the body areas, which are already zeroized.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <17e05e2ee7e12f10dcf012da81e83ebe27eb3bef.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia bf4a66eed4 qcow2: Add subcluster support to handle_alloc_space()
The bdrv_co_pwrite_zeroes() call here fills complete clusters with
zeroes, but it can happen that some subclusters are not part of the
write request or the copy-on-write. This patch makes sure that only
the affected subclusters are overwritten.

A potential improvement would be to also fill with zeroes the other
subclusters if we can guarantee that we are not overwriting existing
data. However this would waste more disk space, so we should first
evaluate if it's really worth doing.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <b3dc97e8e2240ddb5191a4f930e8fc9653f94621.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia ff4cdec7f6 qcow2: Clear the L2 bitmap when allocating a compressed cluster
Compressed clusters always have the bitmap part of the extended L2
entry set to 0.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <04455b3de5dfeb9d1cfe1fc7b02d7060a6e09710.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia aca00cd971 qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()
The L2 bitmap needs to be updated after each write to indicate what
new subclusters are now allocated. This needs to happen even if the
cluster was already allocated and the L2 entry was otherwise valid.

In some cases however a write operation doesn't need change the L2
bitmap (because all affected subclusters were already allocated). This
is detected in calculate_l2_meta(), and qcow2_alloc_cluster_link_l2()
is never called in those cases.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <0875620d49f44320334b6a91c73b3f301f975f38.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia fc2e6528d5 qcow2: Add subcluster support to check_refcounts_l2()
The offset field of an uncompressed cluster's L2 entry must be aligned
to the cluster size, otherwise it is invalid. If the cluster has no
data then it means that the offset points to a preallocation, so we
can clear the offset field without affecting the guest-visible data.
This is what 'qemu-img check' does when run in repair mode.

On traditional qcow2 images this can only happen when QCOW_OFLAG_ZERO
is set, and repairing such entries turns the clusters from ZERO_ALLOC
into ZERO_PLAIN.

Extended L2 entries have no ZERO_ALLOC clusters and no QCOW_OFLAG_ZERO
but the idea is the same: if none of the subclusters are allocated
then we can clear the offset field and leave the bitmap untouched.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <9f4ed1d0a34b0a545b032c31ecd8c14734065342.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia a68cd70326 qcow2: Add subcluster support to discard_in_l2_slice()
Two things need to be taken into account here:

1) With full_discard == true the L2 entry must be cleared completely.
   This also includes the L2 bitmap if the image has extended L2
   entries.

2) With full_discard == false we have to make the discarded cluster
   read back as zeroes. With normal L2 entries this is done with the
   QCOW_OFLAG_ZERO bit, whereas with extended L2 entries this is done
   with the individual 'all zeroes' bits for each subcluster.

   Note however that QCOW_OFLAG_ZERO is not supported in v2 qcow2
   images so, if there is a backing file, discard cannot guarantee
   that the image will read back as zeroes. If this is important for
   the caller it should forbid it as qcow2_co_pdiscard() does (see
   80f5c01183 for more details).

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <5ef8274e628aa3ab559bfac467abf488534f2b76.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia 205fa50750 qcow2: Add subcluster support to zero_in_l2_slice()
The QCOW_OFLAG_ZERO bit that indicates that a cluster reads as
zeroes is only used in standard L2 entries. Extended L2 entries use
individual 'all zeroes' bits for each subcluster.

This must be taken into account when updating the L2 entry and also
when deciding that an existing entry does not need to be updated.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <b61d61606d8c9b367bd641ab37351ddb9172799a.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia 3f9c6b3b1f qcow2: Add subcluster support to qcow2_get_host_offset()
The logic of this function remains pretty much the same, except that
it uses count_contiguous_subclusters(), which combines the logic of
count_contiguous_clusters() / count_contiguous_clusters_unallocated()
and checks individual subclusters.

qcow2_cluster_to_subcluster_type() is not necessary as a separate
function anymore so it's inlined into its caller.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <d2193fd48653a350d80f0eca1c67b1d9053fb2f3.1594396418.git.berto@igalia.com>
[mreitz: Initialize expected_type to anything]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia d53ec3d8d8 qcow2: Add subcluster support to calculate_l2_meta()
If an image has subclusters then there are more copy-on-write
scenarios that we need to consider. Let's say we have a write request
from the middle of subcluster #3 until the end of the cluster:

1) If we are writing to a newly allocated cluster then we need
   copy-on-write. The previous contents of subclusters #0 to #3 must
   be copied to the new cluster. We can optimize this process by
   skipping all leading unallocated or zero subclusters (the status of
   those skipped subclusters will be reflected in the new L2 bitmap).

2) If we are overwriting an existing cluster:

   2.1) If subcluster #3 is unallocated or has the all-zeroes bit set
        then we need copy-on-write (on subcluster #3 only).

   2.2) If subcluster #3 was already allocated then there is no need
        for any copy-on-write. However we still need to update the L2
        bitmap to reflect possible changes in the allocation status of
        subclusters #4 to #31. Because of this, this function checks
        if all the overwritten subclusters are already allocated and
        in this case it returns without creating a new QCowL2Meta
        structure.

After all these changes l2meta_cow_start() and l2meta_cow_end()
are not necessarily cluster-aligned anymore. We need to update the
calculation of old_start and old_end in handle_dependencies() to
guarantee that no two requests try to write on the same cluster.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <4292dd56e4446d386a2fe307311737a711c00708.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia 97490a143e qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
When dealing with subcluster types there is a new value called
QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC that has no equivalent in
QCow2ClusterType.

This patch handles that value in all places where subcluster types
are processed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <bf09e2e2439a468a901bb96ace411eed9ee50295.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia 10dabdc596 qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*
In order to support extended L2 entries some functions of the qcow2
driver need to start dealing with subclusters instead of clusters.

qcow2_get_host_offset() is modified to return the subcluster type
instead of the cluster type, and all callers are updated to replace
all values of QCow2ClusterType with their QCow2SubclusterType
equivalents.

This patch only changes the data types, there are no semantic changes.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <f6c29737c295f32cbee74c903c30b01820363b34.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia ca4a0bb81b qcow2: Add cluster type parameter to qcow2_get_host_offset()
This function returns an integer that can be either an error code or a
cluster type (a value from the QCow2ClusterType enum).

We are going to start using subcluster types instead of cluster types
in some functions so it's better to use the exact data types instead
of integers for clarity and in order to detect errors more easily.

This patch makes qcow2_get_host_offset() return 0 on success and
puts the returned cluster type in a separate parameter. There are no
semantic changes.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <396b6eab1859a271551dcd7dcba77f8934aa3c3f.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia c94d037825 qcow2: Add qcow2_cluster_is_allocated()
This helper function tells us if a cluster is allocated (that is,
there is an associated host offset for it).

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <6d8771c5c79cbdc6c519875a5078e1cc85856d63.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia 70d1cbae03 qcow2: Add qcow2_get_subcluster_range_type()
There are situations in which we want to know how many contiguous
subclusters of the same type there are in a given cluster. This can be
done by simply iterating over the subclusters and repeatedly calling
qcow2_get_subcluster_type() for each one of them.

However once we determined the type of a subcluster we can check the
rest efficiently by counting the number of adjacent ones (or zeroes)
in the bitmap. This is what this function does.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <db917263d568ec6ffb4a41cac3c9100f96bf6c18.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia 34905d8eb1 qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()
This patch adds QCow2SubclusterType, which is the subcluster-level
version of QCow2ClusterType. All QCOW2_SUBCLUSTER_* values have the
the same meaning as their QCOW2_CLUSTER_* equivalents (when they
exist). See below for details and caveats.

In images without extended L2 entries clusters are treated as having
exactly one subcluster so it is possible to replace one data type with
the other while keeping the exact same semantics.

With extended L2 entries there are new possible values, and every
subcluster in the same cluster can obviously have a different
QCow2SubclusterType so functions need to be adapted to work on the
subcluster level.

There are several things that have to be taken into account:

  a) QCOW2_SUBCLUSTER_COMPRESSED means that the whole cluster is
     compressed. We do not support compression at the subcluster
     level.

  b) There are two different values for unallocated subclusters:
     QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN which means that the whole
     cluster is unallocated, and QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
     which means that the cluster is allocated but the subcluster is
     not. The latter can only happen in images with extended L2
     entries.

  c) QCOW2_SUBCLUSTER_INVALID is used to detect the cases where an L2
     entry has a value that violates the specification. The caller is
     responsible for handling these situations.

     To prevent compatibility problems with images that have invalid
     values but are currently being read by QEMU without causing side
     effects, QCOW2_SUBCLUSTER_INVALID is only returned for images
     with extended L2 entries.

qcow2_cluster_to_subcluster_type() is added as a separate function
from qcow2_get_subcluster_type(), but this is only temporary and both
will be merged in a subsequent patch.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <26ef38e270f25851c98b51278852b4c4a7f97e69.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia 39a9f0a50e qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()
Extended L2 entries are 128-bit wide: 64 bits for the entry itself and
64 bits for the subcluster allocation bitmap.

In order to support them correctly get/set_l2_entry() need to be
updated so they take the entry width into account in order to
calculate the correct offset.

This patch also adds the get/set_l2_bitmap() functions that are
used to access the bitmaps. For convenience we allow calling
get_l2_bitmap() on images without subclusters. In this case the
returned value is always 0 and has no meaning.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <6ee0f81ae3329c991de125618b3675e1e46acdbb.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia c8fd8554d9 qcow2: Add l2_entry_size()
qcow2 images with subclusters have 128-bit L2 entries. The first 64
bits contain the same information as traditional images and the last
64 bits form a bitmap with the status of each individual subcluster.

Because of that we cannot assume that L2 entries are sizeof(uint64_t)
anymore. This function returns the proper value for the image.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <d34d578bd0380e739e2dde3e8dd6187d3d249fa9.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia 3e71981592 qcow2: Add offset_into_subcluster() and size_to_subclusters()
Like offset_into_cluster() and size_to_clusters(), but for
subclusters.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <3cc2390dcdef3d234d47c741b708bd8734490862.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia a53e8b7202 qcow2: Add offset_to_sc_index()
For a given offset, return the subcluster number within its cluster
(i.e. with 32 subclusters per cluster it returns a number between 0
and 31).

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <56e3e4ac0d827c6a2f5f259106c5ddb7c4ca2653.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia d0346b5591 qcow2: Add subcluster-related fields to BDRVQcow2State
This patch adds the following new fields to BDRVQcow2State:

- subclusters_per_cluster: Number of subclusters in a cluster
- subcluster_size: The size of each subcluster, in bytes
- subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size

Images without subclusters are treated as if they had exactly one
subcluster per cluster (i.e. subcluster_size = cluster_size).

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <55bfeac86b092fa2c9d182a95cbeb479ff7eca4f.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia a3c7d91625 qcow2: Add dummy has_subclusters() function
This function will be used by the qcow2 code to check if an image has
subclusters or not.

At the moment this simply returns false. Once all patches needed for
subcluster support are ready then QEMU will be able to create and
read images with subclusters and this function will return the actual
value.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <905526221083581a1b7057bca1585487661c5c13.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia 12c6aebedf qcow2: Add get_l2_entry() and set_l2_entry()
The size of an L2 entry is 64 bits, but if we want to have subclusters
we need extended L2 entries. This means that we have to access L2
tables and slices differently depending on whether an image has
extended L2 entries or not.

This patch replaces all l2_slice[] accesses with calls to
get_l2_entry() and set_l2_entry().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <9586363531fec125ba1386e561762d3e4224e9fc.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia 57538c864f qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
When writing to a qcow2 file there are two functions that take a
virtual offset and return a host offset, possibly allocating new
clusters if necessary:

   - handle_copied() looks for normal data clusters that are already
     allocated and have a reference count of 1. In those clusters we
     can simply write the data and there is no need to perform any
     copy-on-write.

   - handle_alloc() looks for clusters that do need copy-on-write,
     either because they haven't been allocated yet, because their
     reference count is != 1 or because they are ZERO_ALLOC clusters.

The ZERO_ALLOC case is a bit special because those are clusters that
are already allocated and they could perfectly be dealt with in
handle_copied() (as long as copy-on-write is performed when required).

In fact, there is extra code specifically for them in handle_alloc()
that tries to reuse the existing allocation if possible and frees them
otherwise.

This patch changes the handling of ZERO_ALLOC clusters so the
semantics of these two functions are now like this:

   - handle_copied() looks for clusters that are already allocated and
     which we can overwrite (NORMAL and ZERO_ALLOC clusters with a
     reference count of 1).

   - handle_alloc() looks for clusters for which we need a new
     allocation (all other cases).

One important difference after this change is that clusters found
in handle_copied() may now require copy-on-write, but this will be
necessary anyway once we add support for subclusters.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <eb17fc938f6be7be2e8d8ff42763d2c19241f866.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia c1587d877e qcow2: Split cluster_needs_cow() out of count_cow_clusters()
We are going to need it in other places.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <65e5d9627ca2ebe7e62deaeddf60949c33067d9d.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia 8f91d6906c qcow2: Add calculate_l2_meta()
handle_alloc() creates a QCowL2Meta structure in order to update the
image metadata and perform the necessary copy-on-write operations.

This patch moves that code to a separate function so it can be used
from other places.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <e5bc4a648dac31972bfa7a0e554be8064be78799.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia 388e581615 qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()
qcow2_get_cluster_offset() takes an (unaligned) guest offset and
returns the (aligned) offset of the corresponding cluster in the qcow2
image.

In practice none of the callers need to know where the cluster starts
so this patch makes the function calculate and return the final host
offset directly. The function is also renamed accordingly.

There is a pre-existing exception with compressed clusters: in this
case the function returns the complete cluster descriptor (containing
the offset and size of the compressed data). This does not change with
this patch but it is now documented.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <ffae6cdc5ca8950e8280ac0f696dcc376cb07095.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Alberto Garcia 9c4269d54b qcow2: Make Qcow2AioTask store the full host offset
The file_cluster_offset field of Qcow2AioTask stores a cluster-aligned
host offset. In practice this is not very useful because all users(*)
of this structure need the final host offset into the cluster, which
they calculate using

   host_offset = file_cluster_offset + offset_into_cluster(s, offset)

There is no reason why Qcow2AioTask cannot store host_offset directly
and that is what this patch does.

(*) compressed clusters are the exception: in this case what
    file_cluster_offset was storing was the full compressed cluster
    descriptor (offset + size). This does not change with this patch
    but it is documented now.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <07c4b15c644dcf06c9459f98846ac1c4ea96e26f.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-25 08:33:20 +02:00
Marc-André Lureau 5e5733e599 meson: convert block
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-21 06:30:18 -04:00
Paolo Bonzini 243af0225a trace: switch position of headers to what Meson requires
Meson doesn't enjoy the same flexibility we have with Make in choosing
the include path.  In particular the tracing headers are using
$(build_root)/$(<D).

In order to keep the include directives unchanged,
the simplest solution is to generate headers with patterns like
"trace/trace-audio.h" and place forwarding headers in the source tree
such that for example "audio/trace.h" includes "trace/trace-audio.h".

This patch is too ugly to be applied to the Makefiles now.  It's only
a way to separate the changes to the tracing header files from the
Meson rewrite of the tracing logic.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-21 06:18:24 -04:00
Stefan Reiter 7661a886a1 block/block-copy: always align copied region to cluster size
Since commit 42ac214406 (block/block-copy: refactor task creation)
block_copy_task_create calculates the area to be copied via
bdrv_dirty_bitmap_next_dirty_area, but that can return an unaligned byte
count if the image's last cluster end is not aligned to the bitmap's
granularity.

Always ALIGN_UP the resulting bytes value to satisfy block_copy_do_copy,
which requires the 'bytes' parameter to be aligned to cluster size.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Message-Id: <20200810095523.15071-1-s.reiter@proxmox.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-08-10 17:12:46 +02:00
Tuguoyi 348fcc4f7a qcow2-cluster: Fix integer left shift error in qcow2_alloc_cluster_link_l2()
When calculating the offset, the result of left shift operation will be promoted
to type int64 automatically because the left operand of + operator is uint64_t.
but the result after integer promotion may be produce an error value for us and
trigger the following asserting error.

For example, consider i=0x2000, cluster_bits=18, the result of left shift
operation will be 0x80000000. Cause argument i is of signed integer type,
the result is automatically promoted to 0xffffffff80000000 which is not
we expected

The way to trigger the assertion error:
  qemu-img create -f qcow2 -o preallocation=full,cluster_size=256k tmpdisk 10G

This patch fix it by casting @i to uint64_t before doing left shift operation

Signed-off-by: Guoyi Tu <tu.guoyi@h3c.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 81ba90fe0c014f269621c283269b42ad@h3c.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-08-05 14:56:11 +01:00
Max Reitz fe16c7ddf8 qcow2: Release read-only bitmaps when inactivated
During migration, we release all bitmaps after storing them on disk, as
long as they are (1) stored on disk, (2) not read-only, and (3)
consistent.

(2) seems arbitrary, though.  The reason we do not release them is
because we do not write them, as there is no need to; and then we just
forget about all bitmaps that we have not written to the file.  However,
read-only persistent bitmaps are still in the file and in sync with
their in-memory representation, so we may as well release them just like
any R/W bitmap that we have updated.

It leads to actual problems, too: After migration, letting the source
continue may result in an error if there were any bitmaps on read-only
nodes (such as backing images), because those have not been released by
bdrv_inactive_all(), but bdrv_invalidate_cache_all() attempts to reload
them (which fails, because they are still present in memory).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200730120234.49288-2-mreitz@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-08-03 08:59:37 -05:00
Peter Maydell 5045be872d nbd patches for 2020-07-28
- fix NBD handling of trim/zero requests larger than 2G
 - allow no-op resizes on NBD (in turn fixing qemu-img convert -c into NBD)
 - several deadlock fixes when using NBD reconnect
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl8gPV4ACgkQp6FrSiUn
 Q2ozdQgAiDHaHG2NX4jmduID7677/XhsLoVl1MV7UZnU+y9qQ2p+Mbsw1oMneu8P
 Dtfgx/mlWVGu68gn31f4xVq74VTZH6p3IGV7PMcYZ50xbESoFs6CYUwUWUp1GeC3
 +kPOl0EpLvm1W/V93sKmg8FflGmNiJHNkfl/ddfk0gs6Z3EfjkmGJt7IP/pv1UCs
 4icWvCJsqw2z8TnEwtTpMX5HZlWth1x37lUOShlPL5kA5hZqU+zYU/bYB5iKx+16
 MebYg7C7CXYCCtH9cDH/swUWhOdQLkywA6yBAwc1zENsKy84aIAJIUls/Ji0q6CY
 A4s5c0FovLBuMDd9oLr0kJbkJQeVZA==
 =DD6l
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2020-07-28' into staging

nbd patches for 2020-07-28

- fix NBD handling of trim/zero requests larger than 2G
- allow no-op resizes on NBD (in turn fixing qemu-img convert -c into NBD)
- several deadlock fixes when using NBD reconnect

# gpg: Signature made Tue 28 Jul 2020 15:59:42 BST
# gpg:                using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg:                 aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2020-07-28:
  block/nbd: nbd_co_reconnect_loop(): don't sleep if drained
  block/nbd: on shutdown terminate connection attempt
  block/nbd: allow drain during reconnect attempt
  block/nbd: split nbd_establish_connection out of nbd_client_connect
  iotests: Test convert to qcow2 compressed to NBD
  iotests: Add more qemu_img helpers
  iotests: Make qemu_nbd_popen() a contextmanager
  block: nbd: Fix convert qcow2 compressed to nbd
  nbd: Fix large trim/zero requests

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-07-28 20:43:03 +01:00
Peter Maydell 0c4fa5bc1a Block patches for 5.1.0:
- Fix block I/O for split transfers
 - Fix iotest 197 for non-qcow2 formats
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAl8gK/gSHG1yZWl0ekBy
 ZWRoYXQuY29tAAoJEPQH2wBh1c9AR+kIALv+Z/A6SPpsAHjpyuRbluuhznfqPuiX
 mIVX0qNhsFBDAUVw1tOkMtfxOIvuaQW/QWzM0UPaHqB/I4ckzE6Dp98ys9uwHPdq
 ez23blWvBuB3P3y2ZBAYhhRlCqt3w4uI/lIJMu7VZBghXxj3fGcuTnLlWx8gb1IH
 74MiBX8XPt532FiFTnpzxgns8NYkZY8mF6zduGqBPx6bPmdNdDfqAhL68Fv8uKJA
 k4dVH6ffPLZD+RrCz9GL5rsYQ6NR6tfyEoRMPqtJznhtzWwu5h5EF3p46VkcKheI
 k0axygEBAr9JbeCwbIK3a4hjQ7eaFQ6j9JR+lPZBRaDbLHv/xGNNuvw=
 =C4Lq
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2020-07-28' into staging

Block patches for 5.1.0:
- Fix block I/O for split transfers
- Fix iotest 197 for non-qcow2 formats

# gpg: Signature made Tue 28 Jul 2020 14:45:28 BST
# gpg:                using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40
# gpg:                issuer "mreitz@redhat.com"
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full]
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* remotes/maxreitz/tags/pull-block-2020-07-28:
  iotests/197: Fix for non-qcow2 formats
  iotests/028: Add test for cross-base-EOF reads
  block: Fix bdrv_aligned_p*v() for qiov_offset != 0

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-07-28 18:00:21 +01:00
Vladimir Sementsov-Ogievskiy 12c75e20a2 block/nbd: nbd_co_reconnect_loop(): don't sleep if drained
We try to go to wakeable sleep, so that, if drain begins it will break
the sleep. But what if nbd_client_co_drain_begin() already called and
s->drained is already true? We'll go to sleep, and drain will have to
wait for the whole timeout. Let's improve it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200727184751.15704-5-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-07-28 09:54:43 -05:00
Vladimir Sementsov-Ogievskiy fbeb3e63b3 block/nbd: on shutdown terminate connection attempt
On shutdown nbd driver may be in a connecting state. We should shutdown
it as well, otherwise we may hang in
nbd_teardown_connection, waiting for conneciton_co to finish in
BDRV_POLL_WHILE(bs, s->connection_co) loop if remote server is down.

How to reproduce the dead lock:

1. Create nbd-fault-injector.conf with the following contents:

[inject-error "mega1"]
event=data
io=readwrite
when=before

2. In one terminal run nbd-fault-injector in a loop, like this:

n=1; while true; do
    echo $n; ((n++));
    ./nbd-fault-injector.py 127.0.0.1:10000 nbd-fault-injector.conf;
done

3. In another terminal run qemu-io in a loop, like this:

n=1; while true; do
    echo $n; ((n++));
    ./qemu-io -c 'read 0 512' nbd://127.0.0.1:10000;
done

After some time, qemu-io will hang. Note, that this hang may be
triggered by another bug, so the whole case is fixed only together with
commit "block/nbd: allow drain during reconnect attempt".

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200727184751.15704-4-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-07-28 09:54:43 -05:00
Vladimir Sementsov-Ogievskiy dd1ec1a4af block/nbd: allow drain during reconnect attempt
It should be safe to reenter qio_channel_yield() on io/channel read/write
path, so it's safe to reduce in_flight and allow attaching new aio
context. And no problem to allow drain itself: connection attempt is
not a guest request. Moreover, if remote server is down, we can hang
in negotiation, blocking drain section and provoking a dead lock.

How to reproduce the dead lock:

1. Create nbd-fault-injector.conf with the following contents:

[inject-error "mega1"]
event=data
io=readwrite
when=before

2. In one terminal run nbd-fault-injector in a loop, like this:

n=1; while true; do
    echo $n; ((n++));
    ./nbd-fault-injector.py 127.0.0.1:10000 nbd-fault-injector.conf;
done

3. In another terminal run qemu-io in a loop, like this:

n=1; while true; do
    echo $n; ((n++));
    ./qemu-io -c 'read 0 512' nbd://127.0.0.1:10000;
done

After some time, qemu-io will hang trying to drain, for example, like
this:

 #3 aio_poll (ctx=0x55f006bdd890, blocking=true) at
    util/aio-posix.c:600
 #4 bdrv_do_drained_begin (bs=0x55f006bea710, recursive=false,
    parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:427
 #5 bdrv_drained_begin (bs=0x55f006bea710) at block/io.c:433
 #6 blk_drain (blk=0x55f006befc80) at block/block-backend.c:1710
 #7 blk_unref (blk=0x55f006befc80) at block/block-backend.c:498
 #8 bdrv_open_inherit (filename=0x7fffba1563bc
    "nbd+tcp://127.0.0.1:10000", reference=0x0, options=0x55f006be86d0,
    flags=24578, parent=0x0, child_class=0x0, child_role=0,
    errp=0x7fffba154620) at block.c:3491
 #9 bdrv_open (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:10000",
    reference=0x0, options=0x0, flags=16386, errp=0x7fffba154620) at
    block.c:3513
 #10 blk_new_open (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:10000",
    reference=0x0, options=0x0, flags=16386, errp=0x7fffba154620) at
    block/block-backend.c:421

And connection_co stack like this:

 #0 qemu_coroutine_switch (from_=0x55f006bf2650, to_=0x7fe96e07d918,
    action=COROUTINE_YIELD) at util/coroutine-ucontext.c:302
 #1 qemu_coroutine_yield () at util/qemu-coroutine.c:193
 #2 qio_channel_yield (ioc=0x55f006bb3c20, condition=G_IO_IN) at
    io/channel.c:472
 #3 qio_channel_readv_all_eof (ioc=0x55f006bb3c20, iov=0x7fe96d729bf0,
    niov=1, errp=0x7fe96d729eb0) at io/channel.c:110
 #4 qio_channel_readv_all (ioc=0x55f006bb3c20, iov=0x7fe96d729bf0,
    niov=1, errp=0x7fe96d729eb0) at io/channel.c:143
 #5 qio_channel_read_all (ioc=0x55f006bb3c20, buf=0x7fe96d729d28
    "\300.\366\004\360U", buflen=8, errp=0x7fe96d729eb0) at
    io/channel.c:247
 #6 nbd_read (ioc=0x55f006bb3c20, buffer=0x7fe96d729d28, size=8,
    desc=0x55f004f69644 "initial magic", errp=0x7fe96d729eb0) at
    /work/src/qemu/master/include/block/nbd.h:365
 #7 nbd_read64 (ioc=0x55f006bb3c20, val=0x7fe96d729d28,
    desc=0x55f004f69644 "initial magic", errp=0x7fe96d729eb0) at
    /work/src/qemu/master/include/block/nbd.h:391
 #8 nbd_start_negotiate (aio_context=0x55f006bdd890,
    ioc=0x55f006bb3c20, tlscreds=0x0, hostname=0x0,
    outioc=0x55f006bf19f8, structured_reply=true,
    zeroes=0x7fe96d729dca, errp=0x7fe96d729eb0) at nbd/client.c:904
 #9 nbd_receive_negotiate (aio_context=0x55f006bdd890,
    ioc=0x55f006bb3c20, tlscreds=0x0, hostname=0x0,
    outioc=0x55f006bf19f8, info=0x55f006bf1a00, errp=0x7fe96d729eb0) at
    nbd/client.c:1032
 #10 nbd_client_connect (bs=0x55f006bea710, errp=0x7fe96d729eb0) at
    block/nbd.c:1460
 #11 nbd_reconnect_attempt (s=0x55f006bf19f0) at block/nbd.c:287
 #12 nbd_co_reconnect_loop (s=0x55f006bf19f0) at block/nbd.c:309
 #13 nbd_connection_entry (opaque=0x55f006bf19f0) at block/nbd.c:360
 #14 coroutine_trampoline (i0=113190480, i1=22000) at
    util/coroutine-ucontext.c:173

Note, that the hang may be
triggered by another bug, so the whole case is fixed only together with
commit "block/nbd: on shutdown terminate connection attempt".

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200727184751.15704-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-07-28 09:54:43 -05:00
Vladimir Sementsov-Ogievskiy fa35591b9c block/nbd: split nbd_establish_connection out of nbd_client_connect
We are going to implement non-blocking version of
nbd_establish_connection, which for a while will be used only for
nbd_reconnect_attempt, not for nbd_open, so we need to call it
separately.

Refactor nbd_reconnect_attempt in a way which makes next commit
simpler.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200727184751.15704-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-07-28 09:54:43 -05:00
Nir Soffer a2b333c018 block: nbd: Fix convert qcow2 compressed to nbd
When converting to qcow2 compressed format, the last step is a special
zero length compressed write, ending in a call to bdrv_co_truncate(). This
call always fails for the nbd driver since it does not implement
bdrv_co_truncate().

For block devices, which have the same limits, the call succeeds since
the file driver implements bdrv_co_truncate(). If the caller asked to
truncate to the same or smaller size with exact=false, the truncate
succeeds. Implement the same logic for nbd.

Example failing without this change:

In one shell start qemu-nbd:

$ truncate -s 1g test.tar
$ qemu-nbd --socket=/tmp/nbd.sock --persistent --format=raw --offset 1536 test.tar

In another shell convert an image to qcow2 compressed via NBD:

$ echo "disk data" > disk.raw
$ truncate -s 1g disk.raw
$ qemu-img convert -f raw -O qcow2 -c disk1.raw nbd+unix:///?socket=/tmp/nbd.sock; echo $?
1

qemu-img failed, but the conversion was successful:

$ qemu-img info nbd+unix:///?socket=/tmp/nbd.sock
image: nbd+unix://?socket=/tmp/nbd.sock
file format: qcow2
virtual size: 1 GiB (1073741824 bytes)
...

$ qemu-img check nbd+unix:///?socket=/tmp/nbd.sock
No errors were found on the image.
1/16384 = 0.01% allocated, 100.00% fragmented, 100.00% compressed clusters
Image end offset: 393216

$ qemu-img compare disk.raw nbd+unix:///?socket=/tmp/nbd.sock
Images are identical.

Fixes: https://bugzilla.redhat.com/1860627
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Message-Id: <20200727215846.395443-2-nsoffer@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: typo fixes]
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-07-28 09:54:19 -05:00
Peter Maydell 2649915121 bitmaps patches for 2020-07-27
- Improve handling of various post-copy bitmap migration scenarios. A lost
 bitmap should merely mean that the next backup must be full rather than
 incremental, rather than abruptly breaking the entire guest migration.
 - Associated iotest improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl8fPRkACgkQp6FrSiUn
 Q2qanQf/dRTrqZ7/hs8aENySf44o0dBzOLZr+FBcrqEj2sd0c6jPzV2X5CVtnA1v
 gBgKJJGLpti3mSeNQDbaXZIQrsesBAuxvJsc6vZ9npDCdMYnK/qPE3Zfw1bx12qR
 cb39ba28P4izgs216h92ZACtUewnvjkxyJgN7zfmCJdNcwZINMUItAS183tSbQjn
 n39Wb7a+umsRgV9HQv/6cXlQIPqFMyAOl5kkzV3evuw7EBoHFnNq4cjPrUnjkqiD
 xf2pcSomaedYd37SpvoH57JxfL3z/90OBcuXhFvbqFk4FgQ63rJ32nRve2ZbIDI0
 XPbohnYjYoFv6Xs/jtTzctZCbZ+jTg==
 =1dmz
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-bitmaps-2020-07-27' into staging

bitmaps patches for 2020-07-27

- Improve handling of various post-copy bitmap migration scenarios. A lost
bitmap should merely mean that the next backup must be full rather than
incremental, rather than abruptly breaking the entire guest migration.
- Associated iotest improvements

# gpg: Signature made Mon 27 Jul 2020 21:46:17 BST
# gpg:                using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg:                 aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-bitmaps-2020-07-27: (24 commits)
  migration: Fix typos in bitmap migration comments
  iotests: Adjust which migration tests are quick
  qemu-iotests/199: add source-killed case to bitmaps postcopy
  qemu-iotests/199: add early shutdown case to bitmaps postcopy
  qemu-iotests/199: check persistent bitmaps
  qemu-iotests/199: prepare for new test-cases addition
  migration/savevm: don't worry if bitmap migration postcopy failed
  migration/block-dirty-bitmap: cancel migration on shutdown
  migration/block-dirty-bitmap: relax error handling in incoming part
  migration/block-dirty-bitmap: keep bitmap state for all bitmaps
  migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete
  migration/block-dirty-bitmap: rename finish_lock to just lock
  migration/block-dirty-bitmap: refactor state global variables
  migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init
  migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup
  migration/block-dirty-bitmap: rename state structure types
  migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start
  qemu-iotests/199: increase postcopy period
  qemu-iotests/199: change discard patterns
  qemu-iotests/199: improve performance: set bitmap by discard
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-07-28 14:38:17 +01:00
Max Reitz 134b7dec6e block: Fix bdrv_aligned_p*v() for qiov_offset != 0
Since these functions take a @qiov_offset, they must always take it into
account when working with @qiov.  There are a couple of places where
they do not, but they should.

Fixes: 65cd4424b9
       ("block/io: bdrv_aligned_preadv: use and support qiov_offset")
Fixes: 28c4da2869
       ("block/io: bdrv_aligned_pwritev: use and support qiov_offset")
Reported-by: Claudio Fontana <cfontana@suse.de>
Reported-by: Bruce Rogers <brogers@suse.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200728120806.265916-2-mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Tested-by: Claudio Fontana <cfontana@suse.de>
Tested-by: Bruce Rogers <brogers@suse.com>
2020-07-28 15:28:47 +02:00
Andrey Shinkevich 8098969cf2 qcow2: Fix capitalization of header extension constant.
Make the capitalization of the hexadecimal numbers consistent for the
QCOW2 header extension constants in docs/interop/qcow2.txt.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <1594973699-781898-2-git-send-email-andrey.shinkevich@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-07-27 15:39:58 -05:00
Max Reitz 984c367814 block/amend: Check whether the node exists
We should check whether the user-specified node-name actually refers to
a node.  The simplest way to do that is to use bdrv_lookup_bs() instead
of bdrv_find_node() (the former wraps the latter, and produces an error
message if necessary).

Reported-by: Coverity (CID 1430268)
Fixes: ced914d0ab
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200710095037.10885-1-mreitz@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
2020-07-27 12:37:25 +02:00
Peter Maydell 0c1fd2f41f Block layer patches:
- file-posix: Handle `EINVAL` fallocate return value
 - qemu-img convert -n: Keep qcow2 v2 target sparse
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl8XDZgRHGt3b2xmQHJl
 ZGhhdC5jb20ACgkQfwmycsiPL9bpFQ/9EnS4iV9w0KW/NuJ4FIVdBD/VZFzokDLi
 1vXVVEjoxAxxiP8KlGM9HRi5NtvOMgzKhNGias0wOFiBorx8Ppfc+3sqwygc2dnw
 Vbl/od2D7xQZkddnp4Upo70m+eWRW6xaxX+lAcl6iS3gBPDwExLaYfBN8lFUyRrs
 T4C0miD+abEEyL3C5A4cEZJ7CIs0n7AqZkqgytWA7clwy79VgDSuMOgP6DOP1tGH
 1uK4gMCB0xbn+PHk96lXPORcLwDBOP0PIluo/zBmffzsEZN1Lv5ddVmxMQWSivin
 UmAbpeEtSw9Py5lRVmLSBYvolVOUleE/Rlzad2iue2be5/G8VP8xiRYMp9mUVpLO
 +LPMUd9NRkPx7wjUJMPKF0G9FgVO7R0+9J6rC33aKBj2XAlxY6qQlqUN2Jo11/fK
 2+9AkU7WVqx3vuW2Zz7wjq3Rjvpg/sK+V3P3Cm6HTwwwPbEwv8GcFe6eKdvJrZ9K
 hhwiFSUOd90OUAdKOQXKMFSZ/t1TrZhdX882Hvth11/AlQAUY4cxQbSKcc2nrvLu
 Axk0Va3haOD+ReRTs8W/iYNdrXGZmbr3MCkNiK3QSvnrdj602ompco7xyDTX1/qH
 6Hu28q7jUG3p3cApLQIZVjmogfqcGU7SWIY4lp9HZqtGP0z+pmWg46UNqzlKLyv5
 Y/fVHHshlRU=
 =QhOf
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- file-posix: Handle `EINVAL` fallocate return value
- qemu-img convert -n: Keep qcow2 v2 target sparse

# gpg: Signature made Tue 21 Jul 2020 16:45:28 BST
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  iotests: Test sparseness for qemu-img convert -n
  qcow2: Implement v2 zero writes with discard if possible
  file-posix: Handle `EINVAL` fallocate return value

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-07-21 19:25:48 +01:00
Peter Maydell b50dab9eca QOM patches for 2020-07-21
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl8XDGsSHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZTPTYP/11AHdREDvz+KDl7AXtndxE9wo6GPGUy
 GhMSXODgOm6a4/nk4xQQR7MU+57C2IMnYbAAdrNXA3YVHjUPMURVsAeTsl9ruchC
 1JiNrtXKgaWv5P8WeNzuXcH9B7n3UZ/mV4FH8v0FhjZhsH6EHP0zOhgNScPyQurz
 AsmYN5hSoSz8jRFQ8xlNzBhNrYX5dG6fOs4M6yBRUgzaCuRcSn/+cgjngfAmcH4k
 F0KK+rBQA+KUAKYp0QUaUcbD5oj/6KaQMsfqsFROR8m+AAJrhkP2ffmcDCHfCW8A
 SQsp8k2SkdHXNgRoDJSea1TYMitCFrTBDv+MfBWOLH4ewQrGAzvsuG5o3FlnVeN7
 CKHkkxqOEifJ/vnThVBTIsqxf8/HefDXi1B5BXSKYSvnPKnnh8HCZ7VxvAsNGZiR
 epr4gEGBCcb9/bXFUmVVPz6H+lUWORzhF4P0spNJwH5BT9FLybWgJt9o4KH+pxYA
 DL4GF5vOkBgIhgUR+vn535vik7M38u6gsB8m22s2FRkZmIxTpxp9eH2ehHGSoVYM
 Yl1kVzJmFMPakl0gG1dMmM4+DJGRTHLCfBFS4pzMs9DaCNHUF3CB8FjXQs3NIFCS
 XGnChbri/wF83DEueTIrUAqF2w0XgEy55aVBOZkmFT6DXPXbx+Y3Q/AomaktBLyv
 FFUe9SfMn63P
 =HTNj
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-qom-2020-07-21' into staging

QOM patches for 2020-07-21

# gpg: Signature made Tue 21 Jul 2020 16:40:27 BST
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-qom-2020-07-21:
  qom: Make info qom-tree sort children more efficiently
  qom: Document object_get_canonical_path() returns malloced string
  qom: Change object_get_canonical_path_component() not to malloc

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-07-21 18:31:52 +01:00
Kevin Wolf 61b3043965 qcow2: Implement v2 zero writes with discard if possible
qcow2 version 2 images don't support the zero flag for clusters, so for
write_zeroes requests, we return -ENOTSUP and get explicit zero buffer
writes. If the image doesn't have a backing file, we can do better: Just
discard the respective clusters.

This is relevant for 'qemu-img convert -O qcow2 -n', where qemu-img has
to assume that the existing target image may contain any data, so it has
to write zeroes. Without this patch, this results in a fully allocated
target image, even if the source image was empty.

Reported-by: Nir Soffer <nsoffer@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200721135520.72355-2-kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-21 16:28:57 +02:00
Antoine Damhet bae127d4dc file-posix: Handle `EINVAL` fallocate return value
The `detect-zeroes=unmap` option may issue unaligned
`FALLOC_FL_PUNCH_HOLE` requests, raw block devices can (and will) return
`EINVAL`, qemu should then write the zeroes to the blockdev instead of
issuing an `IO_ERROR`.

The problem can be reprodced like this:

$ qemu-io -c 'write -P 0 42 1234' --image-opts driver=host_device,filename=/dev/loop0,detect-zeroes=unmap
write failed: Invalid argument

Signed-off-by: Antoine Damhet <antoine.damhet@blade-group.com>
Message-Id: <20200717135603.51180-1-antoine.damhet@blade-group.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-21 16:28:57 +02:00
Markus Armbruster 7a309cc95b qom: Change object_get_canonical_path_component() not to malloc
object_get_canonical_path_component() returns a malloced copy of a
property name on success, null on failure.

19 of its 25 callers immediately free the returned copy.

Change object_get_canonical_path_component() to return the property
name directly.  Since modifying the name would be wrong, adjust the
return type to const char *.

Drop the free from the 19 callers become simpler, add the g_strdup()
to the other six.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200714160202.3121879-4-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
2020-07-21 16:23:43 +02:00
Stefan Hajnoczi 1d719ddc35 block: fix bdrv_aio_cancel() for ENOMEDIUM requests
bdrv_aio_cancel() calls aio_poll() on the AioContext for the given I/O
request until it has completed. ENOMEDIUM requests are special because
there is no BlockDriverState when the drive has no medium!

Define a .get_aio_context() function for BlkAioEmAIOCB requests so that
bdrv_aio_cancel() can find the AioContext where the completion BH is
pending. Without this function bdrv_aio_cancel() aborts on ENOMEDIUM
requests!

libFuzzer triggered the following assertion:

  cat << EOF | qemu-system-i386 -M pc-q35-5.0 \
    -nographic -monitor none -serial none \
    -qtest stdio -trace ide\*
  outl 0xcf8 0x8000fa24
  outl 0xcfc 0xe106c000
  outl 0xcf8 0x8000fa04
  outw 0xcfc 0x7
  outl 0xcf8 0x8000fb20
  write 0x0 0x3 0x2780e7
  write 0xe106c22c 0xd 0x1130c218021130c218021130c2
  write 0xe106c218 0x15 0x110010110010110010110010110010110010110010
  EOF
  ide_exec_cmd IDE exec cmd: bus 0x56170a77a2b8; state 0x56170a77a340; cmd 0xe7
  ide_reset IDEstate 0x56170a77a340
  Aborted (core dumped)

  (gdb) bt
  #1  0x00007ffff4f93895 in abort () at /lib64/libc.so.6
  #2  0x0000555555dc6c00 in bdrv_aio_cancel (acb=0x555556765550) at block/io.c:2745
  #3  0x0000555555dac202 in blk_aio_cancel (acb=0x555556765550) at block/block-backend.c:1546
  #4  0x0000555555b1bd74 in ide_reset (s=0x555557213340) at hw/ide/core.c:1318
  #5  0x0000555555b1e3a1 in ide_bus_reset (bus=0x5555572132b8) at hw/ide/core.c:2422
  #6  0x0000555555b2aa27 in ahci_reset_port (s=0x55555720eb50, port=2) at hw/ide/ahci.c:650
  #7  0x0000555555b29fd7 in ahci_port_write (s=0x55555720eb50, port=2, offset=44, val=16) at hw/ide/ahci.c:360
  #8  0x0000555555b2a564 in ahci_mem_write (opaque=0x55555720eb50, addr=556, val=16, size=1) at hw/ide/ahci.c:513
  #9  0x000055555598415b in memory_region_write_accessor (mr=0x55555720eb80, addr=556, value=0x7fffffffb838, size=1, shift=0, mask=255, attrs=...) at softmmu/memory.c:483

Looking at bdrv_aio_cancel:

2728 /* async I/Os */
2729
2730 void bdrv_aio_cancel(BlockAIOCB *acb)
2731 {
2732     qemu_aio_ref(acb);
2733     bdrv_aio_cancel_async(acb);
2734     while (acb->refcnt > 1) {
2735         if (acb->aiocb_info->get_aio_context) {
2736             aio_poll(acb->aiocb_info->get_aio_context(acb), true);
2737         } else if (acb->bs) {
2738             /* qemu_aio_ref and qemu_aio_unref are not thread-safe, so
2739              * assert that we're not using an I/O thread.  Thread-safe
2740              * code should use bdrv_aio_cancel_async exclusively.
2741              */
2742             assert(bdrv_get_aio_context(acb->bs) == qemu_get_aio_context());
2743             aio_poll(bdrv_get_aio_context(acb->bs), true);
2744         } else {
2745             abort();     <===============
2746         }
2747     }
2748     qemu_aio_unref(acb);
2749 }

Fixes: 02c50efe08 ("block: Add bdrv_aio_cancel_async")
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Buglink: https://bugs.launchpad.net/qemu/+bug/1878255
Originally-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200720100141.129739-1-stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-21 12:00:38 +02:00
Maxim Levitsky 662d0c5392 block/crypto: disallow write sharing by default
My commit 'block/crypto: implement the encryption key management'
accidently allowed raw luks images to be shared between different
qemu processes without share-rw=on explicit override.
Fix that.

Fixes: bbfdae91fb ("block/crypto: implement the encryption key management")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1857490

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200719122059.59843-2-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-21 10:49:02 +02:00
Kevin Wolf a8c5cf27c9 file-posix: Fix leaked fd in raw_open_common() error path
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200717105426.51134-4-kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-17 14:20:57 +02:00
Kevin Wolf bca5283bd4 file-posix: Fix check_hdev_writable() with auto-read-only
For Linux block devices, being able to open the device read-write
doesn't necessarily mean that the device is actually writable (one
example is a read-only LV, as you get with lvchange -pr <device>). We
have check_hdev_writable() to check this condition and fail opening the
image read-write if it's not actually writable.

However, this check doesn't take auto-read-only into account, but
results in a hard failure instead of downgrading to read-only where
possible.

Fix this and do the writable check not based on BDRV_O_RDWR, but only
when this actually results in opening the file read-write. A second
check is inserted in raw_reconfigure_getfd() to have the same check when
dynamic auto-read-only upgrades an image file from read-only to
read-write.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200717105426.51134-3-kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-17 14:20:57 +02:00
Kevin Wolf 20eaf1bf6e file-posix: Move check_hdev_writable() up
We'll need to call it in raw_open_common(), so move the function to
avoid a forward declaration.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200717105426.51134-2-kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-17 14:20:57 +02:00
Kevin Wolf 5edc85571e file-posix: Allow byte-aligned O_DIRECT with NFS
Since commit a6b257a08e ('file-posix: Handle undetectable alignment'),
we assume that if we open a file with O_DIRECT and alignment probing
returns 1, we just couldn't find out the real alignment requirement
because some filesystems make the requirement only for allocated blocks.
In this case, a safe default of 4k is used.

This is too strict for NFS, which does actually allow byte-aligned
requests even with O_DIRECT. Because we can't distinguish both cases
with generic code, let's just look at the file system magic and disable
s->needs_alignment for NFS. This way, O_DIRECT can still be used on NFS
for images that are not aligned to 4k.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200716142601.111237-3-kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-17 14:20:57 +02:00
Marc-André Lureau a08464521c Remove VXHS block device
The vxhs code doesn't compile since v2.12.0. There's no point in fixing
and then adding CI for a config that our users have demonstrated that
they do not use; better to just remove it.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200711065926.2204721-1-marcandre.lureau@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-17 14:20:57 +02:00
Peter Maydell d2628b1eb7 Block layer patches:
- file-posix: Mitigate file fragmentation with extent size hints
 - Tighten qemu-img rules on missing backing format
 - qemu-img map: Don't limit block status request size
 - Fix crash with virtio-scsi and iothreads
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl8NsgMRHGt3b2xmQHJl
 ZGhhdC5jb20ACgkQfwmycsiPL9Z0tA//eqauxD7cTEpwrtLNrRtpiBtMG64BBpxz
 QfkURzB38bMVahHlwq3Gt7Zcov8V4V7vxK66h688Z/fhw3vmqIeVe8+P6+Y5s9FL
 jil8lewHuLTa6xELeugoV7SZXH8AAh1W2fQmiR7EPiOmpSE0wf7C5IShVlX8A04E
 r0n09+61qGjRIe1hNTwTtldqQEfx6UGnxQWcQb81JUPA1lZhX3cnPg/j94Bofr+m
 v/DbVTfsmUtTMjc0PdU7n4DKTWu8OS5B/X0unF21rTtO//cYBrhAeY3ax2jbFBWi
 CIZK8HLI5m9/HFyltql1LOsd+B5TtfnXMfSdvDh2jaVUlto7wTeTnWU1fv4wxUB5
 hk7XgJo/y203ebFNHpTmW8tvLfGTP8uqCVfOEFxzjy+JHGrarlbWkwL2LMOFFAZ2
 s2WcwlfqiYGFTG4+OFdhPf9qPWKSqMr+jTdZJTse64/c6+YXWHk+pP9lfYEUOgSi
 OYwdQUY9uiZ1K13q5Tif2TbFvs+c118xdTgVhAV7VtfPnWc3c647dX7iaq8Szknc
 IT93670Iqf/PzEj+L7XUbbLIIsAcmxD0sr7QAQEt7bfiYIDRIQLiVPyzXplETFg2
 SEkvtqBovm84ct7pWQzqA6lFvr3oIFDNquR40XFGozHNnlBeNi5s7pXQnqUBLElr
 wDDuEi+z5QM=
 =DB0q
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- file-posix: Mitigate file fragmentation with extent size hints
- Tighten qemu-img rules on missing backing format
- qemu-img map: Don't limit block status request size
- Fix crash with virtio-scsi and iothreads

# gpg: Signature made Tue 14 Jul 2020 14:24:19 BST
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  block: Avoid stale pointer dereference in blk_get_aio_context()
  qemu-img: Deprecate use of -b without -F
  block: Add support to warn on backing file change without format
  iotests: Specify explicit backing format where sensible
  qcow2: Deprecate use of qemu-img amend to change backing file
  block: Error if backing file fails during creation without -u
  qcow: Tolerate backing_fmt=
  vmdk: Add trivial backing_fmt support
  sheepdog: Add trivial backing_fmt support
  block: Finish deprecation of 'qemu-img convert -n -o'
  qemu-img: Flush stdout before before potential stderr messages
  file-posix: Mitigate file fragmentation with extent size hints
  iotests/059: Filter out disk size with more standard filter
  qemu-img map: Don't limit block status request size
  iotests: Simplify _filter_img_create() a bit

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-07-14 19:39:52 +01:00
Greg Kurz e6cada9231 block: Avoid stale pointer dereference in blk_get_aio_context()
It is possible for blk_remove_bs() to race with blk_drain_all(), causing
the latter to dereference a stale blk->root pointer:

  blk_remove_bs(blk)
   bdrv_root_unref_child(blk->root)
    child_bs = blk->root->bs
    bdrv_detach_child(blk->root)
     ...
     g_free(blk->root) <============== blk->root becomes stale
    bdrv_unref(child_bs) <============ yield at some point

A blk_drain_all() can be triggered by some guest action in the
meantime, eg. on POWER, SLOF might disable bus mastering on
a virtio-scsi-pci device:

  virtio_write_config()
   virtio_pci_stop_ioeventfd()
    virtio_bus_stop_ioeventfd()
     virtio_scsi_dataplane_stop()
      blk_drain_all()
       blk_get_aio_context()
       bs = blk->root ? blk->root->bs : NULL
            ^^^^^^^^^
              stale

Then, depending on one's luck, QEMU either crashes with SEGV or
hits the assertion in blk_get_aio_context().

blk->root is set by blk_insert_bs() which calls bdrv_root_attach_child()
first. The blk_remove_bs() function should rollback the changes made
by blk_insert_bs() in the opposite order (or it should be documented
somewhere why this isn't the case). Clear blk->root before calling
bdrv_root_unref_child() in blk_remove_bs().

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <159430264541.389456.11925072456012783045.stgit@bahia.lan>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-14 15:24:15 +02:00
Eric Blake e54ee1b385 block: Add support to warn on backing file change without format
For now, this is a mechanical addition; all callers pass false. But
the next patch will use it to improve 'qemu-img rebase -u' when
selecting a backing file with no format.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Message-Id: <20200706203954.341758-10-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-14 15:18:59 +02:00
Eric Blake bc5ee6da71 qcow2: Deprecate use of qemu-img amend to change backing file
The use of 'qemu-img amend' to change qcow2 backing files is not
tested very well.  In particular, our implementation has a bug where
if a new backing file is provided without a format, then the prior
format is blindly reused, even if this results in data corruption, but
this is not caught by iotests.

There are also situations where amending other options needs access to
the original backing file (for example, on a downgrade to a v2 image,
knowing whether a v3 zero cluster must be allocated or may be left
unallocated depends on knowing whether the backing file already reads
as zero), but the command line does not have a nice way to tell us
both the backing file to use for opening the image as well as the
backing file to install after the operation is complete.

Even if we do allow changing the backing file, it is redundant with
the existing ability to change backing files via 'qemu-img rebase -u'.
It is time to deprecate this support (leaving the existing behavior
intact, even if it is buggy), and at a point in the future, require
the use of only 'qemu-img rebase' for adjusting backing chain
relations, saving 'qemu-img amend' for changes unrelated to the
backing chain.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200706203954.341758-8-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-14 15:18:59 +02:00
Eric Blake 344acbd62f qcow: Tolerate backing_fmt=
qcow has no space in the metadata to store a backing format, and there
are existing qcow images backed both by raw or by other formats
(usually qcow) images, reliant on probing to tell the difference.  On
the bright side, because we probe every time, raw files are marked as
probed and we thus forbid a commit action into the backing file where
guest-controlled contents could change the result of the probe next
time around (the iotest added here proves that).

Still, allowing the user to specify the backing format during
creation, even if we can't record it, is a good thing.  This patch
blindly allows any value that resolves to a known driver, even if the
user's request is a mismatch from what probing finds; then the next
patch will further enhance things to verify that the user's request
matches what we actually probe.  With this and the next patch in
place, we will finally be ready to deprecate the creation of images
where a backing format was not explicitly specified by the user.

Note that this is only for QemuOpts usage; there is no change to the
QAPI to allow a format through -blockdev.

Add a new iotest 301 just for qcow, to demonstrate the latest
behavior, and to make it easier to show the improvements made in the
next patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200706203954.341758-6-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-14 15:18:59 +02:00
Eric Blake d51a814cf4 vmdk: Add trivial backing_fmt support
vmdk already requires that if backing_file is present, that it be
another vmdk image (see vmdk_co_do_create).  Meanwhile, we want to
move towards always being explicit about the backing format for other
drivers where it matters.  So for convenience, make qemu-img create -F
vmdk work, while rejecting all other explicit formats (note that this
is only for QemuOpts usage; there is no change to the QAPI to allow a
format through -blockdev).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200706203954.341758-5-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-14 15:18:59 +02:00
Eric Blake 80fa43e7df sheepdog: Add trivial backing_fmt support
Sheepdog already requires that if backing_file is present, that it be
another sheepdog image (see sd_co_create).  Meanwhile, we want to move
towards always being explicit about the backing format for other
drivers where it matters.  So for convenience, make qemu-img create -F
sheepdog work, while rejecting all other explicit formats (note that
this is only for QemuOpts usage; there is no change to the QAPI to
allow a format through -blockdev).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200706203954.341758-4-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-14 15:18:59 +02:00
Kevin Wolf ffa244c84a file-posix: Mitigate file fragmentation with extent size hints
Especially when O_DIRECT is used with image files so that the page cache
indirection can't cause a merge of allocating requests, the file will
fragment on the file system layer, with a potentially very small
fragment size (this depends on the requests the guest sent).

On Linux, fragmentation can be reduced by setting an extent size hint
when creating the file (at least on XFS, it can't be set any more after
the first extent has been allocated), basically giving raw files a
"cluster size" for allocation.

This adds a create option to set the extent size hint, and changes the
default from not setting a hint to setting it to 1 MB. The main reason
why qcow2 defaults to smaller cluster sizes is that COW becomes more
expensive, which is not an issue with raw files, so we can choose a
larger size. The tradeoff here is only potentially wasted disk space.

For qcow2 (or other image formats) over file-posix, the advantage should
even be greater because they grow sequentially without leaving holes, so
there won't be wasted space. Setting even larger extent size hints for
such images may make sense. This can be done with the new option, but
let's keep the default conservative for now.

The effect is very visible with a test that intentionally creates a
badly fragmented file with qemu-img bench (the time difference while
creating the file is already remarkable) and then looks at the number of
extents and the time a simple "qemu-img map" takes.

Without an extent size hint:

    $ ./qemu-img create -f raw -o extent_size_hint=0 ~/tmp/test.raw 10G
    Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=0
    $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0
    Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192)
    Run completed in 25.848 seconds.
    $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096
    Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192)
    Run completed in 19.616 seconds.
    $ filefrag ~/tmp/test.raw
    /home/kwolf/tmp/test.raw: 2000000 extents found
    $ time ./qemu-img map ~/tmp/test.raw
    Offset          Length          Mapped to       File
    0               0x1e8480000     0               /home/kwolf/tmp/test.raw

    real    0m1,279s
    user    0m0,043s
    sys     0m1,226s

With the new default extent size hint of 1 MB:

    $ ./qemu-img create -f raw -o extent_size_hint=1M ~/tmp/test.raw 10G
    Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=1048576
    $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0
    Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192)
    Run completed in 11.833 seconds.
    $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096
    Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192)
    Run completed in 10.155 seconds.
    $ filefrag ~/tmp/test.raw
    /home/kwolf/tmp/test.raw: 178 extents found
    $ time ./qemu-img map ~/tmp/test.raw
    Offset          Length          Mapped to       File
    0               0x1e8480000     0               /home/kwolf/tmp/test.raw

    real    0m0,061s
    user    0m0,040s
    sys     0m0,014s

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200707142329.48303-1-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-14 15:18:59 +02:00
Eric Blake 00d69986da nbd: Avoid off-by-one in long export name truncation
When snprintf returns the same value as the buffer size, the final
byte was truncated to ensure a NUL terminator.  Fortunately, such long
export names are unusual enough, with no real impact other than what
is displayed to the user.

Fixes: 5c86bdf120
Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200622210355.414941-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2020-07-13 09:01:01 -05:00
Xie Yongji c58daf76a6 iscsi: return -EIO when sense fields are meaningless
When an I/O request failed, now we only return correct
value on scsi check condition. We should also have a
default errno such as -EIO in other case.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20200701105444.3226-2-xieyongji@bytedance.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 18:02:23 -04:00
Xie Yongji dd3b00202a iscsi: handle check condition status in retry loop
The handling of check condition was incorrect because
we would only do it after retries exceed maximum.

Fixes: 8c460269aa ("iscsi: base all handling of check condition on scsi_sense_to_errno")
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20200701105444.3226-1-xieyongji@bytedance.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 18:02:23 -04:00
Vladimir Sementsov-Ogievskiy 795d946d07 nbd: Use ERRP_GUARD()
If we want to check error after errp-function call, we need to
introduce local_err and then propagate it to errp. Instead, use
the ERRP_GUARD() macro, benefits are:
1. No need of explicit error_propagate call
2. No need of explicit local_err variable: use errp directly
3. ERRP_GUARD() leaves errp as is if it's not NULL or
   &error_fatal, this means that we don't break error_abort
   (we'll abort on error_set, not on error_propagate)

If we want to add some info to errp (by error_prepend() or
error_append_hint()), we must use the ERRP_GUARD() macro.
Otherwise, this info will not be added when errp == &error_fatal
(the program will exit prior to the error_append_hint() or
error_prepend() call).  Fix several such cases, e.g. in nbd_read().

This commit is generated by command

    sed -n '/^Network Block Device (NBD)$/,/^$/{s/^F: //p}' \
        MAINTAINERS | \
    xargs git ls-files | grep '\.[hc]$' | \
    xargs spatch \
        --sp-file scripts/coccinelle/errp-guard.cocci \
        --macro-file scripts/cocci-macro-file.h \
        --in-place --no-show-diff --max-width 80

Reported-by: Kevin Wolf <kwolf@redhat.com>
Reported-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200707165037.1026246-8-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[ERRP_AUTO_PROPAGATE() renamed to ERRP_GUARD(), and
auto-propagated-errp.cocci to errp-guard.cocci.  Commit message
tweaked again.]
2020-07-10 15:18:09 +02:00
Markus Armbruster 386f6c07d2 error: Avoid error_propagate() after migrate_add_blocker()
When migrate_add_blocker(blocker, &errp) is followed by
error_propagate(errp, err), we can often just as well do
migrate_add_blocker(..., errp).

Do that with this Coccinelle script:

    @@
    expression blocker, err, errp;
    expression ret;
    @@
    -    ret = migrate_add_blocker(blocker, &err);
    -    if (err) {
    +    ret = migrate_add_blocker(blocker, errp);
    +    if (ret < 0) {
             ... when != err;
    -        error_propagate(errp, err);
             ...
         }

    @@
    expression blocker, err, errp;
    @@
    -    migrate_add_blocker(blocker, &err);
    -    if (err) {
    +    if (migrate_add_blocker(blocker, errp) < 0) {
             ... when != err;
    -        error_propagate(errp, err);
             ...
         }

Double-check @err is not used afterwards.  Dereferencing it would be
use after free, but checking whether it's null would be legitimate.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200707160613.848843-43-armbru@redhat.com>
2020-07-10 15:18:08 +02:00
Markus Armbruster b11a093c60 qapi: Smooth another visitor error checking pattern
Convert

    visit_type_FOO(v, ..., &ptr, &err);
    ...
    if (err) {
        ...
    }

to

    visit_type_FOO(v, ..., &ptr, errp);
    ...
    if (!ptr) {
        ...
    }

for functions that set @ptr to non-null / null on success / error.

Eliminate error_propagate() that are now unnecessary.  Delete @err
that are now unused.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-40-armbru@redhat.com>
2020-07-10 15:18:08 +02:00
Markus Armbruster 4bc6d7ee0e block/parallels: Simplify parallels_open() after previous commit
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200707160613.848843-39-armbru@redhat.com>
2020-07-10 15:18:08 +02:00
Markus Armbruster a5f9b9df25 error: Reduce unnecessary error propagation
When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away, even when we need to keep error_propagate() for other
error paths.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-38-armbru@redhat.com>
2020-07-10 15:18:08 +02:00
Markus Armbruster 992861fb1e error: Eliminate error_propagate() manually
When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away.  The previous two commits did that for sufficiently simple
cases with Coccinelle.  Do it for several more manually.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-37-armbru@redhat.com>
2020-07-10 15:18:08 +02:00
Markus Armbruster af175e85f9 error: Eliminate error_propagate() with Coccinelle, part 2
When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away.  The previous commit did that with a Coccinelle script I
consider fairly trustworthy.  This commit uses the same script with
the matching of return taken out, i.e. we convert

    if (!foo(..., &err)) {
        ...
        error_propagate(errp, err);
        ...
    }

to

    if (!foo(..., errp)) {
        ...
        ...
    }

This is unsound: @err could still be read between afterwards.  I don't
know how to express "no read of @err without an intervening write" in
Coccinelle.  Instead, I manually double-checked for uses of @err.

Suboptimal line breaks tweaked manually.  qdev_realize() simplified
further to placate scripts/checkpatch.pl.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-36-armbru@redhat.com>
2020-07-10 15:18:08 +02:00
Markus Armbruster 668f62ec62 error: Eliminate error_propagate() with Coccinelle, part 1
When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away.  Convert

    if (!foo(..., &err)) {
        ...
        error_propagate(errp, err);
        ...
        return ...
    }

to

    if (!foo(..., errp)) {
        ...
        ...
        return ...
    }

where nothing else needs @err.  Coccinelle script:

    @rule1 forall@
    identifier fun, err, errp, lbl;
    expression list args, args2;
    binary operator op;
    constant c1, c2;
    symbol false;
    @@
         if (
    (
    -        fun(args, &err, args2)
    +        fun(args, errp, args2)
    |
    -        !fun(args, &err, args2)
    +        !fun(args, errp, args2)
    |
    -        fun(args, &err, args2) op c1
    +        fun(args, errp, args2) op c1
    )
            )
         {
             ... when != err
                 when != lbl:
                 when strict
    -        error_propagate(errp, err);
             ... when != err
    (
             return;
    |
             return c2;
    |
             return false;
    )
         }

    @rule2 forall@
    identifier fun, err, errp, lbl;
    expression list args, args2;
    expression var;
    binary operator op;
    constant c1, c2;
    symbol false;
    @@
    -    var = fun(args, &err, args2);
    +    var = fun(args, errp, args2);
         ... when != err
         if (
    (
             var
    |
             !var
    |
             var op c1
    )
            )
         {
             ... when != err
                 when != lbl:
                 when strict
    -        error_propagate(errp, err);
             ... when != err
    (
             return;
    |
             return c2;
    |
             return false;
    |
             return var;
    )
         }

    @depends on rule1 || rule2@
    identifier err;
    @@
    -    Error *err = NULL;
         ... when != err

Not exactly elegant, I'm afraid.

The "when != lbl:" is necessary to avoid transforming

         if (fun(args, &err)) {
             goto out
         }
         ...
     out:
         error_propagate(errp, err);

even though other paths to label out still need the error_propagate().
For an actual example, see sclp_realize().

Without the "when strict", Coccinelle transforms vfio_msix_setup(),
incorrectly.  I don't know what exactly "when strict" does, only that
it helps here.

The match of return is narrower than what I want, but I can't figure
out how to express "return where the operand doesn't use @err".  For
an example where it's too narrow, see vfio_intx_enable().

Silently fails to convert hw/arm/armsse.c, because Coccinelle gets
confused by ARMSSE being used both as typedef and function-like macro
there.  Converted manually.

Line breaks tidied up manually.  One nested declaration of @local_err
deleted manually.  Preexisting unwanted blank line dropped in
hw/riscv/sifive_e.c.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-35-armbru@redhat.com>
2020-07-10 15:18:08 +02:00
Markus Armbruster dcfe480544 error: Avoid unnecessary error_propagate() after error_setg()
Replace

    error_setg(&err, ...);
    error_propagate(errp, err);

by

    error_setg(errp, ...);

Related pattern:

    if (...) {
        error_setg(&err, ...);
        goto out;
    }
    ...
 out:
    error_propagate(errp, err);
    return;

When all paths to label out are that way, replace by

    if (...) {
        error_setg(errp, ...);
        return;
    }

and delete the label along with the error_propagate().

When we have at most one other path that actually needs to propagate,
and maybe one at the end that where propagation is unnecessary, e.g.

    foo(..., &err);
    if (err) {
        goto out;
    }
    ...
    bar(..., &err);
 out:
    error_propagate(errp, err);
    return;

move the error_propagate() to where it's needed, like

    if (...) {
        foo(..., &err);
        error_propagate(errp, err);
        return;
    }
    ...
    bar(..., errp);
    return;

and transform the error_setg() as above.

In some places, the transformation results in obviously unnecessary
error_propagate().  The next few commits will eliminate them.

Bonus: the elimination of gotos will make later patches in this series
easier to review.

Candidates for conversion tracked down with this Coccinelle script:

    @@
    identifier err, errp;
    expression list args;
    @@
    -    error_setg(&err, args);
    +    error_setg(errp, args);
         ... when != err
         error_propagate(errp, err);

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-34-armbru@redhat.com>
2020-07-10 15:18:08 +02:00
Markus Armbruster 14217038bc qapi: Use returned bool to check for failure, manual part
The previous commit used Coccinelle to convert from checking the Error
object to checking the return value.  Convert a few more manually.
Also tweak control flow in places to conform to the conventional "if
error bail out" pattern.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200707160613.848843-20-armbru@redhat.com>
2020-07-10 15:18:08 +02:00
Markus Armbruster 62a35aaa31 qapi: Use returned bool to check for failure, Coccinelle part
The previous commit enables conversion of

    visit_foo(..., &err);
    if (err) {
        ...
    }

to

    if (!visit_foo(..., errp)) {
        ...
    }

for visitor functions that now return true / false on success / error.
Coccinelle script:

    @@
    identifier fun =~ "check_list|input_type_enum|lv_start_struct|lv_type_bool|lv_type_int64|lv_type_str|lv_type_uint64|output_type_enum|parse_type_bool|parse_type_int64|parse_type_null|parse_type_number|parse_type_size|parse_type_str|parse_type_uint64|print_type_bool|print_type_int64|print_type_null|print_type_number|print_type_size|print_type_str|print_type_uint64|qapi_clone_start_alternate|qapi_clone_start_list|qapi_clone_start_struct|qapi_clone_type_bool|qapi_clone_type_int64|qapi_clone_type_null|qapi_clone_type_number|qapi_clone_type_str|qapi_clone_type_uint64|qapi_dealloc_start_list|qapi_dealloc_start_struct|qapi_dealloc_type_anything|qapi_dealloc_type_bool|qapi_dealloc_type_int64|qapi_dealloc_type_null|qapi_dealloc_type_number|qapi_dealloc_type_str|qapi_dealloc_type_uint64|qobject_input_check_list|qobject_input_check_struct|qobject_input_start_alternate|qobject_input_start_list|qobject_input_start_struct|qobject_input_type_any|qobject_input_type_bool|qobject_input_type_bool_keyval|qobject_input_type_int64|qobject_input_type_int64_keyval|qobject_input_type_null|qobject_input_type_number|qobject_input_type_number_keyval|qobject_input_type_size_keyval|qobject_input_type_str|qobject_input_type_str_keyval|qobject_input_type_uint64|qobject_input_type_uint64_keyval|qobject_output_start_list|qobject_output_start_struct|qobject_output_type_any|qobject_output_type_bool|qobject_output_type_int64|qobject_output_type_null|qobject_output_type_number|qobject_output_type_str|qobject_output_type_uint64|start_list|visit_check_list|visit_check_struct|visit_start_alternate|visit_start_list|visit_start_struct|visit_type_.*";
    expression list args;
    typedef Error;
    Error *err;
    @@
    -    fun(args, &err);
    -    if (err)
    +    if (!fun(args, &err))
         {
             ...
         }

A few line breaks tidied up manually.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200707160613.848843-19-armbru@redhat.com>
2020-07-10 15:18:08 +02:00
Markus Armbruster 235e59cf03 qemu-option: Use returned bool to check for failure
The previous commit enables conversion of

    foo(..., &err);
    if (err) {
        ...
    }

to

    if (!foo(..., &err)) {
        ...
    }

for QemuOpts functions that now return true / false on success /
error.  Coccinelle script:

    @@
    identifier fun = {
        opts_do_parse, parse_option_bool, parse_option_number,
        parse_option_size, qemu_opt_parse, qemu_opt_rename, qemu_opt_set,
        qemu_opt_set_bool, qemu_opt_set_number, qemu_opts_absorb_qdict,
        qemu_opts_do_parse, qemu_opts_from_qdict_entry, qemu_opts_set,
        qemu_opts_validate
    };
    expression list args, args2;
    typedef Error;
    Error *err;
    @@
    -    fun(args, &err, args2);
    -    if (err)
    +    if (!fun(args, &err, args2))
         {
             ...
         }

A few line breaks tidied up manually.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200707160613.848843-15-armbru@redhat.com>
[Conflict with commit 0b6786a9c1 "block/amend: refactor qcow2 amend
options" resolved by rerunning Coccinelle on master's version]
2020-07-10 15:17:35 +02:00
Markus Armbruster c6ecec43b2 qemu-option: Check return value instead of @err where convenient
Convert uses like

    opts = qemu_opts_create(..., &err);
    if (err) {
        ...
    }

to

    opts = qemu_opts_create(..., errp);
    if (!opts) {
        ...
    }

Eliminate error_propagate() that are now unnecessary.  Delete @err
that are now unused.

Note that we can't drop parallels_open()'s error_propagate() here.  We
continue to execute it even in the converted case.  It's a no-op then:
local_err is null.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200707160613.848843-8-armbru@redhat.com>
2020-07-10 15:01:06 +02:00
Eric Blake 365fed5111 qed: Simplify backing reads
The other four drivers that support backing files (qcow, qcow2,
parallels, vmdk) all rely on the block layer to populate zeroes when
reading beyond EOF of a short backing file.  We can simplify the qed
code by doing likewise.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200528094405.145708-11-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 10:34:14 +02:00
Vladimir Sementsov-Ogievskiy a2adbbf603 block: drop unallocated_blocks_are_zero
Currently this field only set by qed and qcow2. But in fact, all
backing-supporting formats (parallels, qcow, qcow2, qed, vmdk) share
these semantics: on unallocated blocks, if there is no backing file they
just memset the buffer with zeroes.

So, document this behavior for .supports_backing and drop
.unallocated_blocks_are_zero

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200528094405.145708-10-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 10:34:14 +02:00
Vladimir Sementsov-Ogievskiy cdf9ebf18f block/vhdx: drop unallocated_blocks_are_zero
vhdx doesn't have .bdrv_co_block_status handler, so DATA|ALLOCATED is
always assumed for it in bdrv_co_block_status().
unallocated_blocks_are_zero is useless (it doesn't affect the only user
of the field: bdrv_co_block_status()), drop it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200528094405.145708-9-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 10:34:14 +02:00
Vladimir Sementsov-Ogievskiy ac9185603e block/file-posix: drop unallocated_blocks_are_zero
raw_co_block_status() in block/file-posix.c never returns 0, so
unallocated_blocks_are_zero is useless (it doesn't affect the only user
of the field: bdrv_co_block_status()). Drop it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200528094405.145708-8-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 10:34:14 +02:00
Vladimir Sementsov-Ogievskiy 32d293c8c6 block/iscsi: drop unallocated_blocks_are_zero
We set bdi->unallocated_blocks_are_zero = iscsilun->lbprz, but
iscsi_co_block_status doesn't return 0 in case of iscsilun->lbprz, it
returns ZERO when appropriate. So actually unallocated_blocks_are_zero
is useless (it doesn't affect the only user of the field:
bdrv_co_block_status()). Drop it now.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200528094405.145708-7-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 10:34:14 +02:00
Vladimir Sementsov-Ogievskiy 74036395ea block/crypto: drop unallocated_blocks_are_zero
It's false by default, no needs to set it. We are going to drop this
variable at all, so drop it now here, it doesn't hurt.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200528094405.145708-6-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 10:34:14 +02:00
Vladimir Sementsov-Ogievskiy 2c060c0f50 block/vpc: return ZERO block-status when appropriate
In case when get_image_offset() returns -1, we do zero out the
corresponding chunk of qiov. So, this should be reported as ZERO.

Note that this changes visible output of "qemu-img map --output=json"
and "qemu-io -c map" commands. For qemu-img map, the change is obvious:
we just mark as zero what is really zero. For qemu-io it's less
obvious: what was unallocated now is allocated.

There is an inconsistency in understanding of unallocated regions in
Qemu: backing-supporting format-drivers return 0 block-status to report
go-to-backing logic for this area. Some protocol-drivers (iscsi) return
0 to report fs-unallocated-non-zero status (i.e., don't occupy space on
disk, read result is undefined).

BDRV_BLOCK_ALLOCATED is defined as something more close to
go-to-backing logic. Still it is calculated as ZERO | DATA, so 0 from
iscsi is treated as unallocated. It doesn't influence backing-chain
behavior, as iscsi can't have backing file. But it does influence
"qemu-io -c map".

We should solve this inconsistency at some future point. Now, let's
just make backing-not-supporting format drivers (vdi in the previous
patch and vpc now) to behave more like backing-supporting drivers
and not report 0 block-status. More over, returning ZERO status is
absolutely valid thing, and again, corresponds to how the other
format-drivers (backing-supporting) work.

After block-status update, it never reports 0, so setting
unallocated_blocks_are_zero doesn't make sense (as the only user of it
is bdrv_co_block_status and it checks unallocated_blocks_are_zero only
for unallocated areas). Drop it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200528094405.145708-5-vsementsov@virtuozzo.com>
[mreitz: qemu-io -c map as used by iotest 146 now reports everything as
         allocated; in order to make the test do something useful, we
         use qemu-img map --output=json now]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 10:32:38 +02:00
Vladimir Sementsov-Ogievskiy 2ea0332f42 block/vdi: return ZERO block-status when appropriate
In case of !VDI_IS_ALLOCATED[], we do zero out the corresponding chunk
of qiov. So, this should be reported as ZERO.

Note that this changes visible output of "qemu-img map --output=json"
and "qemu-io -c map" commands. For qemu-img map, the change is obvious:
we just mark as zero what is really zero. For qemu-io it's less
obvious: what was unallocated now is allocated.

There is an inconsistency in understanding of unallocated regions in
Qemu: backing-supporting format-drivers return 0 block-status to report
go-to-backing logic for this area. Some protocol-drivers (iscsi) return
0 to report fs-unallocated-non-zero status (i.e., don't occupy space on
disk, read result is undefined).

BDRV_BLOCK_ALLOCATED is defined as something more close to
go-to-backing logic. Still it is calculated as ZERO | DATA, so 0 from
iscsi is treated as unallocated. It doesn't influence backing-chain
behavior, as iscsi can't have backing file. But it does influence
"qemu-io -c map".

We should solve this inconsistency at some future point. Now, let's
just make backing-not-supporting format drivers (vdi at this patch and
vpc with the following) to behave more like backing-supporting drivers
and not report 0 block-status. More over, returning ZERO status is
absolutely valid thing, and again, corresponds to how the other
format-drivers (backing-supporting) work.

After block-status update, it never reports 0, so setting
unallocated_blocks_are_zero doesn't make sense (as the only user of it
is bdrv_co_block_status and it checks unallocated_blocks_are_zero only
for unallocated areas). Drop it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200528094405.145708-4-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 08:49:28 +02:00
Vladimir Sementsov-Ogievskiy 7b1efe996c block: inline bdrv_unallocated_blocks_are_zero()
The function has only one user: bdrv_co_block_status(). Inline it to
simplify reviewing of the following patches, which will finally drop
unallocated_blocks_are_zero field too.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200528094405.145708-3-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 08:49:28 +02:00
Maxim Levitsky 8ea1613d91 block/qcow2: implement blockdev-amend
Currently the implementation only supports amending the encryption
options, unlike the qemu-img version

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200608094030.670121-14-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 08:49:28 +02:00
Maxim Levitsky 30da9dd88a block/crypto: implement blockdev-amend
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200608094030.670121-13-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 08:49:28 +02:00
Maxim Levitsky ced914d0ab block/core: add generic infrastructure for x-blockdev-amend qmp command
blockdev-amend will be used similiar to blockdev-create
to allow on the fly changes of the structure of the format based block devices.

Current plan is to first support encryption keyslot management for luks
based formats (raw and embedded in qcow2)

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20200608094030.670121-12-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 08:49:28 +02:00
Maxim Levitsky 90766d9db9 block/qcow2: extend qemu-img amend interface with crypto options
Now that we have all the infrastructure in place,
wire it in the qcow2 driver and expose this to the user.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200608094030.670121-9-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 08:49:28 +02:00
Maxim Levitsky bbfdae91fb block/crypto: implement the encryption key management
This implements the encryption key management using the generic code in
qcrypto layer and exposes it to the user via qemu-img

This code adds another 'write_func' because the initialization
write_func works directly on the underlying file, and amend
works on instance of luks device.

This commit also adds a 'hack/workaround' I and Kevin Wolf (thanks)
made to make the driver both support write sharing (to avoid breaking the users),
and be safe against concurrent  metadata update (the keyslots)

Eventually the write sharing for luks driver will be deprecated
and removed together with this hack.

The hack is that we ask (as a format driver) for BLK_PERM_CONSISTENT_READ
and then when we want to update the keys, we unshare that permission.
So if someone else has the image open, even readonly, encryption
key update will fail gracefully.

Also thanks to Daniel Berrange for the idea of
unsharing read, rather that write permission which allows
to avoid cases when the other user had opened the image read-only.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200608094030.670121-8-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 08:49:28 +02:00
Maxim Levitsky e0d0ddc591 block/crypto: rename two functions
rename the write_func to create_write_func, and init_func to create_init_func.
This is preparation for other write_func that will be used to update the encryption keys.

No functional changes

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20200608094030.670121-7-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 08:49:28 +02:00
Maxim Levitsky 0b6786a9c1 block/amend: refactor qcow2 amend options
Some qcow2 create options can't be used for amend.
Remove them from the qcow2 create options and add generic logic to detect
such options in qemu-img

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[mreitz: Dropped some iotests reference output hunks that became
         unnecessary thanks to
         "iotests: Make _filter_img_create more active"]
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200625125548.870061-12-mreitz@redhat.com>
2020-07-06 08:49:28 +02:00
Maxim Levitsky df373fb0a3 block/amend: separate amend and create options for qemu-img
Some options are only useful for creation
(or hard to be amended, like cluster size for qcow2), while some other
options are only useful for amend, like upcoming keyslot management
options for luks

Since currently only qcow2 supports amend, move all its options
to a common macro and then include it in each action option list.

In future it might be useful to remove some options which are
not supported anyway from amend list, which currently
cause an error message if amended.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200608094030.670121-5-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 08:49:28 +02:00
Maxim Levitsky a3579bfa0a block/amend: add 'force' option
'force' option will be used for some unsafe amend operations.

This includes things like erasing last keyslot in luks based formats
(which destroys the data, unless the master key is backed up
by external means), but that _might_ be desired result.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200608094030.670121-4-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 08:49:28 +02:00
Maxim Levitsky 43cbd06df2 qcrypto/core: add generic infrastructure for crypto options amendment
This will be used first to implement luks keyslot management.

block_crypto_amend_opts_init will be used to convert
qemu-img cmdline to QCryptoBlockAmendOptions

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20200608094030.670121-2-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 08:49:28 +02:00
Alberto Garcia a5675f3901 qcow2: Fix preallocation on images with unaligned sizes
When resizing an image with qcow2_co_truncate() using the falloc or
full preallocation modes the code assumes that both the old and new
sizes are cluster-aligned.

There are two problems with this:

  1) The calculation of how many clusters are involved does not always
     get the right result.

     Example: creating a 60KB image and resizing it (with
     preallocation=full) to 80KB won't allocate the second cluster.

  2) No copy-on-write is performed, so in the previous example if
     there is a backing file then the first 60KB of the first cluster
     won't be filled with data from the backing file.

This patch fixes both issues.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200617140036.20311-1-berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 08:33:06 +02:00
Vladimir Sementsov-Ogievskiy e8de7ba9ea block/block-copy: block_copy_dirty_clusters: fix failure check
ret may be > 0 on success path at this point. Fix assertion, which may
crash currently.

Fixes: 4ce5dd3e9b
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200526181347.489557-1-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-06 08:33:06 +02:00
Kevin Wolf 3dfa23b9ef vvfat: Fix array_remove_slice()
array_remove_slice() calls array_roll() with array->next - 1 as the
destination index. This is only correct for count == 1, otherwise we're
writing past the end of the array. array->next - count would be correct.

However, this is the only place ever calling array_roll(), so this
rather complicated operation isn't even necessary.

Fix the problem and simplify the code by replacing it with a single
memmove() call. array_roll() can now be removed.

Reported-by: Nathan Huckleberry <nhuck15@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200623175534.38286-3-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-03 09:37:03 +02:00
Kevin Wolf c79e243ed6 vvfat: Check that updated filenames are valid
FAT allows only a restricted set of characters in file names, and for
some of the illegal characters, it's actually important that we catch
them: If filenames can contain '/', the guest can construct filenames
containing "../" and escape from the assigned vvfat directory. The same
problem could arise if ".." was ever accepted as a literal filename.

Fix this by adding a check that all filenames are valid in
check_directory_consistency().

Reported-by: Nathan Huckleberry <nhuck15@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200623175534.38286-2-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-03 09:37:03 +02:00
Stefan Hajnoczi 7838c67f22 block/nvme: support nested aio_poll()
QEMU block drivers are supposed to support aio_poll() from I/O
completion callback functions. This means completion processing must be
re-entrant.

The standard approach is to schedule a BH during completion processing
and cancel it at the end of processing. If aio_poll() is invoked by a
callback function then the BH will run. The BH continues the suspended
completion processing.

All of this means that request A's cb() can synchronously wait for
request B to complete. Previously the nvme block driver would hang
because it didn't process completions from nested aio_poll().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20200617132201.1832152-8-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-23 15:46:08 +01:00
Stefan Hajnoczi b75fd5f554 block/nvme: keep BDRVNVMeState pointer in NVMeQueuePair
Passing around both BDRVNVMeState and NVMeQueuePair is unwieldy. Reduce
the number of function arguments by keeping the BDRVNVMeState pointer in
NVMeQueuePair. This will come in handly when a BH is introduced in a
later patch and only one argument can be passed to it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200617132201.1832152-7-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-23 15:46:08 +01:00
Stefan Hajnoczi a5db74f324 block/nvme: clarify that free_req_queue is protected by q->lock
Existing users access free_req_queue under q->lock. Document this.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200617132201.1832152-6-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-23 15:46:08 +01:00
Stefan Hajnoczi 1086e95da1 block/nvme: switch to a NVMeRequest freelist
There are three issues with the current NVMeRequest->busy field:
1. The busy field is accidentally accessed outside q->lock when request
   submission fails.
2. Waiters on free_req_queue are not woken when a request is returned
   early due to submission failure.
2. Finding a free request involves scanning all requests. This makes
   request submission O(n^2).

Switch to an O(1) freelist that is always accessed under the lock.

Also differentiate between NVME_QUEUE_SIZE, the actual SQ/CQ size, and
NVME_NUM_REQS, the number of usable requests. This makes the code
simpler than using NVME_QUEUE_SIZE everywhere and having to keep in mind
that one slot is reserved.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20200617132201.1832152-5-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-23 15:46:08 +01:00
Stefan Hajnoczi 04b3fb39c8 block/nvme: don't access CQE after moving cq.head
Do not access a CQE after incrementing q->cq.head and releasing q->lock.
It is unlikely that this causes problems in practice but it's a latent
bug.

The reason why it should be safe at the moment is that completion
processing is not re-entrant and the CQ doorbell isn't written until the
end of nvme_process_completion().

Make this change now because QEMU expects completion processing to be
re-entrant and later patches will do that.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200617132201.1832152-4-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-23 15:46:08 +01:00
Stefan Hajnoczi d38253cf8b block/nvme: drop tautologous assertion
nvme_process_completion() explicitly checks cid so the assertion that
follows is always true:

  if (cid == 0 || cid > NVME_QUEUE_SIZE) {
      ...
      continue;
  }
  assert(cid <= NVME_QUEUE_SIZE);

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200617132201.1832152-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-23 15:46:08 +01:00
Stefan Hajnoczi 2446e0e2e9 block/nvme: poll queues without q->lock
A lot of CPU time is spent simply locking/unlocking q->lock during
polling. Check for completion outside the lock to make q->lock disappear
from the profile.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20200617132201.1832152-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-23 15:46:08 +01:00
Eric Blake f17d684770 qcow2: Tweak comments on qcow2_get_persistent_dirty_bitmap_size
For now, we don't have persistent bitmaps in any other formats, but
that might not be true in the future.  Make it obvious that our
incoming parameter is not necessarily a qcow2 image, and therefore is
limited to just the bdrv_dirty_bitmap_* API calls (rather than probing
into qcow2 internals).

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200608190821.3293867-1-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-06-17 14:53:39 +02:00
Eric Blake e37adbebd1 block: Refactor subdirectory recursion during make
Rather than listing block/monitor from the top-level Makefile.objs, we
should instead list monitor from block/Makefile.objs.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Fixes: bb4e58c613
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200608173339.3244211-1-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-06-17 14:53:39 +02:00
Eric Blake 5c86bdf120 block: Call attention to truncation of long NBD exports
Commit 93676c88 relaxed our NBD client code to request export names up
to the NBD protocol maximum of 4096 bytes without NUL terminator, even
though the block layer can't store anything longer than 4096 bytes
including NUL terminator for display to the user.  Since this means
there are some export names where we have to truncate things, we can
at least try to make the truncation a bit more obvious for the user.
Note that in spite of the truncated display name, we can still
communicate with an NBD server using such a long export name; this was
deemed nicer than refusing to even connect to such a server (since the
server may not be under our control, and since determining our actual
length limits gets tricky when nbd://host:port/export and
nbd+unix:///export?socket=/path are themselves variable-length
expansions beyond the export name but count towards the block layer
name length).

Reported-by: Xueqiang Wei <xuwei@redhat.com>
Fixes: https://bugzilla.redhat.com/1843684
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200610163741.3745251-3-eblake@redhat.com>
2020-06-10 12:58:59 -05:00
Vladimir Sementsov-Ogievskiy 7d2410cea1 block: Factor out bdrv_run_co()
We have a few bdrv_*() functions that can either spawn a new coroutine
and wait for it with BDRV_POLL_WHILE() or use a fastpath if they are
alreeady running in a coroutine. All of them duplicate basically the
same code.

Factor the common code into a new function bdrv_run_co().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20200520144901.16589-1-vsementsov@virtuozzo.com
   [Factor out bdrv_run_co_entry too]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-05 09:54:48 +01:00
Stefano Garzarella 769335ecb1 io_uring: use io_uring_cq_ready() to check for ready cqes
In qemu_luring_poll_cb() we are not using the cqe peeked from the
CQ ring. We are using io_uring_peek_cqe() only to see if there
are cqes ready, so we can replace it with io_uring_cq_ready().

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20200519134942.118178-1-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-05 09:54:48 +01:00
Stefano Garzarella b4e44c9944 io_uring: retry io_uring_submit() if it fails with errno=EINTR
As recently documented [1], io_uring_enter(2) syscall can return an
error (errno=EINTR) if the operation was interrupted by a delivery
of a signal before it could complete.

This should happen when IORING_ENTER_GETEVENTS flag is used, for
example during io_uring_submit_and_wait() or during io_uring_submit()
when IORING_SETUP_IOPOLL is enabled.

We shouldn't have this problem for now, but it's better to prevent it.

[1] 344355ec66

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20200519133041.112138-1-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-05 09:54:48 +01:00
Eric Blake 5d72c68b49 qcow2: Expose bitmaps' size during measure
It's useful to know how much space can be occupied by qcow2 persistent
bitmaps, even though such metadata is unrelated to the guest-visible
data.  Report this value as an additional QMP field, present when
measuring an existing image and output format that both support
bitmaps.  Update iotest 178 and 190 to updated output, as well as new
coverage in 190 demonstrating non-zero values made possible with the
recently-added qemu-img bitmap command (see 3b51ab4b).

The new 'bitmaps size:' field is displayed automatically as part of
'qemu-img measure' any time it is present in QMP (that is, any time
both the source image being measured and destination format support
bitmaps, even if the measurement is 0 because there are no bitmaps
present).  If the field is absent, it means that no bitmaps can be
copied (source, destination, or both lack bitmaps, including when
measuring based on size rather than on a source image).  This behavior
is compatible with an upcoming patch adding 'qemu-img convert
--bitmaps': that command will fail in the same situations where this
patch omits the field.

The addition of a new field demonstrates why we should always
zero-initialize qapi C structs; while the qcow2 driver still fully
populates all fields, the raw and crypto drivers had to be tweaked to
avoid uninitialized data.

Consideration was also given towards having a 'qemu-img measure
--bitmaps' which errors out when bitmaps are not possible, and
otherwise sums the bitmaps into the existing allocation totals rather
than displaying as a separate field, as a potential convenience
factor.  But this was ultimately decided to be more complexity than
necessary when the QMP interface was sufficient enough with bitmaps
remaining a separate field.

See also: https://bugzilla.redhat.com/1779904

Reported-by: Nir Soffer <nsoffer@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200521192137.1120211-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2020-05-28 13:16:16 -05:00
Vladimir Sementsov-Ogievskiy 7ae89a0de9 block/dirty-bitmap: add bdrv_has_named_bitmaps helper
To be used for bitmap migration in further commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200521220648.3255-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-05-28 13:15:22 -05:00
Eric Blake bb4e58c613 blockdev: Split off basic bitmap operations for qemu-img
Upcoming patches want to add some basic bitmap manipulation abilities
to qemu-img.  But blockdev.o is too heavyweight to link into qemu-img
(among other things, it would drag in block jobs and transaction
support - qemu-img does offline manipulation, where atomicity is less
important because there are no concurrent modifications to compete
with), so it's time to split off the bare bones of what we will need
into a new file block/monitor/bitmap-qmp-cmds.o.

This is sufficient to expose 6 QMP commands for use by qemu-img (add,
remove, clear, enable, disable, merge), as well as move the three
helper functions touched in the previous patch.  Regarding
MAINTAINERS, the new file is automatically part of block core, but
also makes sense as related to other dirty bitmap files.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513011648.166876-6-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2020-05-19 10:32:14 -05:00
Eric Blake ef893b5c84 block: Make it easier to learn which BDS support bitmaps
Upcoming patches will enhance bitmap support in qemu-img, but in doing
so, it turns out to be nice to suppress output when persistent bitmaps
make no sense (such as on a qcow2 v2 image).  Add a hook to make this
easier to query.

This patch adds a new callback .bdrv_supports_persistent_dirty_bitmap,
rather than trying to shoehorn the answer in via existing callbacks.
In particular, while it might have been possible to overload
.bdrv_co_can_store_new_dirty_bitmap to special-case a NULL input to
answer whether any persistent bitmaps are supported, that is at odds
with whether a particular bitmap can be stored (for example, even on
an image that supports persistent bitmaps but has currently filled up
the maximum number of bitmaps, attempts to store another one should
fail); and the new functionality doesn't require coroutine safety.
Similarly, we could have added one more piece of information to
.bdrv_get_info, but then again, most callers to that function tend to
already discard extraneous information, and making it a catch-all
rather than a series of dedicated scalar queries hasn't really
simplified life.

In the future, when we improve the ability to look up bitmaps through
a filter, we will probably also want to teach the block layer to
automatically let filters pass this request on through.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513011648.166876-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2020-05-19 10:32:14 -05:00
Philippe Mathieu-Daudé d7eca54222 block/block-copy: Simplify block_copy_do_copy()
block_copy_do_copy() is static, only used in block_copy_task_entry
with the error_is_read argument set. No need to check for it,
simplify.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200507121129.29760-3-philmd@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Philippe Mathieu-Daudé c78dd00e35 block/block-copy: Fix uninitialized variable in block_copy_task_entry
Fix when building with -Os:

    CC      block/block-copy.o
  block/block-copy.c: In function ‘block_copy_task_entry’:
  block/block-copy.c:428:38: error: ‘error_is_read’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
    428 |         t->call_state->error_is_read = error_is_read;
        |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200507121129.29760-2-philmd@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz e5d8a40685 block: Drop @child_class from bdrv_child_perm()
Implementations should decide the necessary permissions based on @role.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-35-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz 1f38f04eac block: Pass BdrvChildRole in remaining cases
These calls have no real use for the child role yet, but it will not
harm to give one.

Notably, the bdrv_root_attach_child() call in blockjob.c is left
unmodified because there is not much the generic BlockJob object wants
from its children.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-34-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz 69dca43d6b block: Use bdrv_default_perms()
bdrv_default_perms() can decide which permission profile to use based on
the BdrvChildRole, so block drivers do not need to select it explicitly.

The blkverify driver now no longer shares the WRITE permission for the
image to verify.  We thus have to adjust two places in
test-block-iothread not to take it.  (Note that in theory, blkverify
should behave like quorum in this regard and share neither WRITE nor
RESIZE for both of its children.  In practice, it does not really
matter, because blkverify is used only for debugging, so we might as
well keep its permissions rather liberal.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-30-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz 58944401d6 block: Use child_of_bds in remaining places
Replace child_file by child_of_bds in all remaining places (excluding
tests).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-28-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz b3af2af43b block: Make filter drivers use child_of_bds
Note that some filters have secondary children, namely blkverify (the
image to be verified) and blklogwrites (the log).  This patch does not
touch those children.

Note that for blkverify, the filtered child should not be format-probed.
While there is nothing enforcing this here, in practice, it will not be:
blkverify implements .bdrv_file_open.  The block layer ensures (and in
fact, asserts) that BDRV_O_PROTOCOL is set for every BDS whose driver
implements .bdrv_file_open.  This flag will then be bequeathed to
blkverify's children, and they will thus (by default) not be probed
either.

("By default" refers to the fact that blkverify's other child (the
non-filtered one) will have BDRV_O_PROTOCOL force-unset, because that is
what happens for all non-filtered children of non-format drivers.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-27-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz 8b1869daad block: Make format drivers use child_of_bds
Commonly, they need to pass the BDRV_CHILD_IMAGE set as the
BdrvChildRole; but there are exceptions for drivers with external data
files (qcow2 and vmdk).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-26-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz 25191e5ff0 block: Make backing files child_of_bds children
Make all parents of backing files pass the appropriate BdrvChildRole.
By doing so, we can switch their BdrvChildClass over to the generic
child_of_bds, which will do the right thing when given a correct
BdrvChildRole.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-24-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz 36ee58d13b block: Switch child_format users to child_of_bds
Both users (quorum and blkverify) use child_format for
not-really-filtered children, so the appropriate BdrvChildRole in both
cases is DATA.  (Note that this will cause bdrv_inherited_options() to
force-allow format probing.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-22-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz 500e243420 raw-format: Split raw_read_options()
Split raw_read_options() into one function that actually just reads the
options, and another that applies them.  This will allow us to detect
whether the user has specified any options before attaching the file
child (so we can decide on its role based on the options).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-21-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz 3cdc69d31b block: Pass parent_is_format to .inherit_options()
We plan to unify the generic .inherit_options() functions.  The
resulting common function will need to decide whether to force-enable
format probing, force-disable it, or leave it as-is.  To make this
decision, it will need to know whether the parent node is a format node
or not (because we never want format probing if the parent is a format
node already (except for the backing chain)).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-9-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz 272c02eaef block: Pass BdrvChildRole to .inherit_options()
For now, all callers (effectively) pass 0 and no callee evaluates thie
value.  Later patches will change both.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-8-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz bf8e925eb5 block: Pass BdrvChildRole to bdrv_child_perm()
For now, all callers pass 0 and no callee evaluates this value.  Later
patches will change both.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-7-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz 258b776515 block: Add BdrvChildRole to BdrvChild
For now, it is always set to 0.  Later patches in this series will
ensure that all callers pass an appropriate combination of flags.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-6-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz bd86fb990c block: Rename BdrvChildRole to BdrvChildClass
This structure nearly only contains parent callbacks for child state
changes.  It cannot really reflect a child's role, because different
roles may overlap (as we will see when real roles are introduced), and
because parents can have custom callbacks even when the child fulfills a
standard role.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200513110544.176672-4-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz d67066d8bc block: Add BlockDriver.is_format
We want to unify child_format and child_file at some point.  One of the
important things that set format drivers apart from other drivers is
that they do not expect other format nodes under them (except in the
backing chain), i.e. we must not probe formats inside of formats.  That
means we need something on which to distinguish format drivers from
others, and hence this flag.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200513110544.176672-3-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz 6540fd153c block: Mark commit, mirror, blkreplay as filters
The commit, mirror, and blkreplay block nodes are filters, so they should
be marked as such.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-2-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz f844ec01b3 block: Use bdrv_make_empty() where possible
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200429141126.85159-3-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Kevin Wolf 6ecbc6c526 replication: Avoid blk_make_empty() on read-only child
This is just a bandaid to keep tests/test-replication working after
bdrv_make_empty() starts to assert that we're not trying to call it on a
read-only child.

For the real solution in the future, replication should not steal the
BdrvChild from its backing file (this is never correct to do!), but
instead have its own child node references, with the appropriate
permissions.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz 2d97fde439 block: Use blk_make_empty() after commits
bdrv_commit() already has a BlockBackend pointing to the BDS that we
want to empty, it just has the wrong permissions.

qemu-img commit has no BlockBackend pointing to the old backing file
yet, but introducing one is simple.

After this commit, bdrv_make_empty() is the only remaining caller of
BlockDriver.bdrv_make_empty().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200429141126.85159-5-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[kwolf: Fixed up reference output for 098]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Max Reitz 2b7bbdbdef block: Add blk_make_empty()
Two callers of BlockDriver.bdrv_make_empty() remain that should not call
this method directly.  Both do not have access to a BdrvChild, but they
can use a BlockBackend, so we add this function that lets them use it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200429141126.85159-4-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Lukas Straub e140f4b7b8 block/replication.c: Avoid cancelling the job twice
If qemu in colo secondary mode is stopped, it crashes because
s->backup_job is canceled twice: First with job_cancel_sync_all()
in qemu_cleanup() and then in replication_stop().

Fix this by assigning NULL to s->backup_job when the job completes
so replication_stop() and replication_do_checkpoint() won't touch
the job.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Message-Id: <20200511090801.7ed5d8f3@luklap>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:25 +02:00
Kevin Wolf e83dd6808c mirror: Make sure that source and target size match
If the target is shorter than the source, mirror would copy data until
it reaches the end of the target and then fail with an I/O error when
trying to write past the end.

If the target is longer than the source, the mirror job would complete
successfully, but the target wouldn't actually be an accurate copy of
the source image (it would contain some additional garbage at the end).

Fix this by checking that both images have the same size when the job
starts.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200511135825.219437-4-kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18 19:05:24 +02:00
Markus Armbruster d2623129a7 qom: Drop parameter @errp of object_property_add() & friends
The only way object_property_add() can fail is when a property with
the same name already exists.  Since our property names are all
hardcoded, failure is a programming error, and the appropriate way to
handle it is passing &error_abort.

Same for its variants, except for object_property_add_child(), which
additionally fails when the child already has a parent.  Parentage is
also under program control, so this is a programming error, too.

We have a bit over 500 callers.  Almost half of them pass
&error_abort, slightly fewer ignore errors, one test case handles
errors, and the remaining few callers pass them to their own callers.

The previous few commits demonstrated once again that ignoring
programming errors is a bad idea.

Of the few ones that pass on errors, several violate the Error API.
The Error ** argument must be NULL, &error_abort, &error_fatal, or a
pointer to a variable containing NULL.  Passing an argument of the
latter kind twice without clearing it in between is wrong: if the
first call sets an error, it no longer points to NULL for the second
call.  ich9_pm_add_properties(), sparc32_ledma_realize(),
sparc32_dma_realize(), xilinx_axidma_realize(), xilinx_enet_realize()
are wrong that way.

When the one appropriate choice of argument is &error_abort, letting
users pick the argument is a bad idea.

Drop parameter @errp and assert the preconditions instead.

There's one exception to "duplicate property name is a programming
error": the way object_property_add() implements the magic (and
undocumented) "automatic arrayification".  Don't drop @errp there.
Instead, rename object_property_add() to object_property_try_add(),
and add the obvious wrapper object_property_add().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200505152926.18877-15-armbru@redhat.com>
[Two semantic rebase conflicts resolved]
2020-05-15 07:07:58 +02:00
Vladimir Sementsov-Ogievskiy fc9aefc8c0 block/block-copy: fix use-after-free of task pointer
Obviously, we should g_free the task after trace point and offset
update.

Reported-by: Coverity (CID 1428756)
Fixes: 4ce5dd3e9b
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200507183800.22626-1-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-05-13 14:20:31 +02:00
Denis Plotnikov d298ac10ad qcow2: add zstd cluster compression
zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.

The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G

The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.

compress cmd:
  time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
                  src.img [zlib|zstd]_compressed.img
decompress cmd
  time ./qemu-img convert -O qcow2
                  [zlib|zstd]_compressed.img uncompressed.img

           compression               decompression
         zlib       zstd           zlib         zstd
------------------------------------------------------------
real     65.5       16.3 (-75 %)    1.9          1.6 (-16 %)
user     65.0       15.8            5.3          2.5
sys       3.3        0.2            2.0          2.0

Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
QAPI part:
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200507082521.29210-4-dplotnikov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-05-13 14:20:31 +02:00
Denis Plotnikov 25dd077d1d qcow2: rework the cluster compression routine
The patch enables processing the image compression type defined
for the image and chooses an appropriate method for image clusters
(de)compression.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200507082521.29210-3-dplotnikov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-05-13 14:20:31 +02:00
Denis Plotnikov 572ad9783f qcow2: introduce compression type feature
The patch adds some preparation parts for incompatible compression type
feature to qcow2 allowing the use different compression methods for
image clusters (de)compressing.

It is implied that the compression type is set on the image creation and
can be changed only later by image conversion, thus compression type
defines the only compression algorithm used for the image, and thus,
for all image clusters.

The goal of the feature is to add support of other compression methods
to qcow2. For example, ZSTD which is more effective on compression than ZLIB.

The default compression is ZLIB. Images created with ZLIB compression type
are backward compatible with older qemu versions.

Adding of the compression type breaks a number of tests because now the
compression type is reported on image creation and there are some changes
in the qcow2 header in size and offsets.

The tests are fixed in the following ways:
    * filter out compression_type for many tests
    * fix header size, feature table size and backing file offset
      affected tests: 031, 036, 061, 080
      header_size +=8: 1 byte compression type
                       7 bytes padding
      feature_table += 48: incompatible feature compression type
      backing_file_offset += 56 (8 + 48 -> header_change + feature_table_change)
    * add "compression type" for test output matching when it isn't filtered
      affected tests: 049, 060, 061, 065, 082, 085, 144, 182, 185, 198, 206,
                      242, 255, 274, 280

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
QAPI part:
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200507082521.29210-2-dplotnikov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-05-13 14:20:31 +02:00
Eric Blake 47e0b38a13 block: Drop unused .bdrv_has_zero_init_truncate
Now that there are no clients of bdrv_has_zero_init_truncate, none of
the drivers need to worry about providing it.

What's more, this eliminates a source of some confusion: a literal
reading of the documentation as written in ceaca56f and implemented in
commit 1dcaf527 claims that a driver which returns 0 for
bdrv_has_zero_init_truncate() must not return 1 for
bdrv_has_zero_init(); this condition was violated for parallels, qcow,
and sometimes for vdi, although in practice it did not matter since
those drivers also lacked .bdrv_co_truncate.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200428202905.770727-10-eblake@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08 13:26:35 +02:00
Eric Blake dbc636e791 vhdx: Rework truncation logic
The vhdx driver uses truncation for image growth, with a special case
for blocks that already read as zero but which are only being
partially written.  But with a bit of rearranging, it's just as easy
to defer the decision on whether truncation resulted in zeroes to the
actual allocation attempt, reducing the number of places that still
use bdrv_has_zero_init_truncate.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200428202905.770727-9-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08 13:26:35 +02:00
Eric Blake bda4cdcbb9 parallels: Rework truncation logic
The parallels driver tries to use truncation for image growth, but can
only do so when reads are guaranteed as zero.  Now that we have a way
to request zero contents from truncation, we can defer the decision to
actual allocation attempts rather than up front, reducing the number
of places that still use bdrv_has_zero_init_truncate.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200428202905.770727-8-eblake@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08 13:26:35 +02:00
Eric Blake be9c9404db ssh: Support BDRV_REQ_ZERO_WRITE for truncate
Our .bdrv_has_zero_init_truncate can detect when the remote side
always zero fills; we can reuse that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it when the server gives it to us for
free.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200428202905.770727-7-eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08 13:26:35 +02:00
Eric Blake fec00559e7 sheepdog: Support BDRV_REQ_ZERO_WRITE for truncate
Our .bdrv_has_zero_init_truncate always returns 1 because sheepdog
always 0-fills; we can use that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200428202905.770727-6-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08 13:26:35 +02:00
Eric Blake 2f98910d5b rbd: Support BDRV_REQ_ZERO_WRITE for truncate
Our .bdrv_has_zero_init_truncate always returns 1 because rbd always
0-fills; we can use that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200428202905.770727-5-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08 13:26:35 +02:00
Eric Blake 8f23aaf5d6 nfs: Support BDRV_REQ_ZERO_WRITE for truncate
Our .bdrv_has_zero_init_truncate returns 1 if we detect that the OS
always 0-fills; we can use that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it when the OS gives it to us for
free.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200428202905.770727-4-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08 13:26:35 +02:00
Eric Blake 8e51979504 file-win32: Support BDRV_REQ_ZERO_WRITE for truncate
When using bdrv_file, .bdrv_has_zero_init_truncate always returns 1;
therefore, we can behave just like file-posix, and always implement
BDRV_REQ_ZERO_WRITE by ignoring it since the OS gives it to us for
free (note that file-posix.c had to use an 'if' because it shared code
between regular files and block devices, but in file-win32.c,
bdrv_host_device uses a separate .bdrv_file_open).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200428202905.770727-3-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08 13:26:35 +02:00
Eric Blake 5e09bcee5b gluster: Drop useless has_zero_init callback
block.c already defaults to 0 if we don't provide a callback; there's
no need to write a callback that always fails.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200428202905.770727-2-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08 13:26:35 +02:00
Max Reitz 4b96fa3846 qcow2: Fix preallocation on block devices
Calling bdrv_getlength() to get the pre-truncate file size will not
really work on block devices, because they have always the same length,
and trying to write beyond it will fail with a rather cryptic error
message.

Instead, we should use qcow2_get_last_cluster() and bdrv_getlength()
only as a fallback.

Before this patch:
$ truncate -s 1G test.img
$ sudo losetup -f --show test.img
/dev/loop0
$ sudo qemu-img create -f qcow2 -o preallocation=full /dev/loop0 64M
Formatting '/dev/loop0', fmt=qcow2 size=67108864 cluster_size=65536
preallocation=full lazy_refcounts=off refcount_bits=16
qemu-img: /dev/loop0: Could not resize image: Failed to resize refcount
structures: No space left on device

With this patch:
$ sudo qemu-img create -f qcow2 -o preallocation=full /dev/loop0 64M
Formatting '/dev/loop0', fmt=qcow2 size=67108864 cluster_size=65536
preallocation=full lazy_refcounts=off refcount_bits=16
qemu-img: /dev/loop0: Could not resize image: Failed to resize
underlying file: Preallocation mode 'full' unsupported for this
non-regular file

So as you can see, it still fails, but now the problem is missing
support on the block device level, so we at least get a better error
message.

Note that we cannot preallocate block devices on truncate by design,
because we do not know what area to preallocate.  Their length is always
the same, the truncate operation does not change it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200505141801.1096763-1-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08 13:26:35 +02:00
Kevin Wolf 958a04bd32 backup: Make sure that source and target size match
Since the introduction of a backup filter node in commit 00e30f05d, the
backup block job crashes when the target image is smaller than the
source image because it will try to write after the end of the target
node without having BLK_PERM_RESIZE. (Previously, the BlockBackend layer
would have caught this and errored out gracefully.)

We can fix this and even do better than the old behaviour: Check that
source and target have the same image size at the start of the block job
and unshare BLK_PERM_RESIZE. (This permission was already unshared
before the same commit 00e30f05d, but the BlockBackend that was used to
make the restriction was removed without a replacement.) This will
immediately error out when starting the job instead of only when writing
to a block that doesn't exist in the target.

Longer target than source would technically work because we would never
write to blocks that don't exist, but semantically these are invalid,
too, because a backup is supposed to create a copy, not just an image
that starts with a copy.

Fixes: 00e30f05de
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1778593
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200430142755.315494-4-kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08 13:26:35 +02:00
Kevin Wolf 58226634c4 backup: Improve error for bdrv_getlength() failure
bdrv_get_device_name() will be an empty string with modern management
tools that don't use -drive. Use bdrv_get_device_or_node_name() instead
so that the node name is used if the BlockBackend is anonymous.

While at it, start with upper case to make the message consistent with
the rest of the function.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200430142755.315494-3-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08 13:26:35 +02:00
Kevin Wolf 2758be056b vmdk: Flush only once in vmdk_L2update()
If we have a backup L2 table, we currently flush once after writing to
the active L2 table and again after writing to the backup table. A
single flush is enough and makes things a little less slow.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200430133007.170335-6-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08 13:26:35 +02:00
Kevin Wolf 78cae78dbc vmdk: Don't update L2 table for zero write on zero cluster
If a cluster is already zeroed, we don't have to call vmdk_L2update(),
which is rather slow because it flushes the image file.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200430133007.170335-5-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08 13:26:35 +02:00
Kevin Wolf 4823cde58e vmdk: Fix partial overwrite of zero cluster
When overwriting a zero cluster, we must not perform copy-on-write from
the backing file, but from a zeroed buffer.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200430133007.170335-4-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08 13:26:35 +02:00
Kevin Wolf 2821c1cc0f vmdk: Fix zero cluster allocation
m_data must contain valid data even for zero clusters when no cluster
was allocated in the image file. Without this, zero writes segfault with
images that have zeroed_grain=on.

For zero writes, we don't want to allocate a cluster in the image file
even in compressed files.

Fixes: 524089bce4
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200430133007.170335-3-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08 13:26:35 +02:00
Kevin Wolf 4dc20e6465 vmdk: Rename VmdkMetaData.valid to new_allocation
m_data is used for zero clusters even though valid == 0. It really only
means that a new cluster was allocated in the image file. Rename it to
reflect this.

While at it, change it from int to bool, too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200430133007.170335-2-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08 13:26:35 +02:00
Alberto Garcia e4d7019e1a qcow2: Avoid integer wraparound in qcow2_co_truncate()
After commit f01643fb8b when an image is
extended and BDRV_REQ_ZERO_WRITE is set then the new clusters are
zeroized.

The code however does not detect correctly situations when the old and
the new end of the image are within the same cluster. The problem can
be reproduced with these steps:

   qemu-img create -f qcow2 backing.qcow2 1M
   qemu-img create -f qcow2 -F qcow2 -b backing.qcow2 top.qcow2
   qemu-img resize --shrink top.qcow2 520k
   qemu-img resize top.qcow2 567k

In the last step offset - zero_start causes an integer wraparound.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200504155217.10325-1-berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08 13:26:35 +02:00
Maxim Levitsky 3d1900a471 block: luks: better error message when creating too large files
Currently if you attampt to create too large file with luks you
get the following error message:

Formatting 'test.luks', fmt=luks size=17592186044416 key-secret=sec0
qemu-img: test.luks: Could not resize file: File too large

While for raw format the error message is
qemu-img: test.img: The image size is too large for file format 'raw'

The reason for this is that qemu-img checks for errono of the failure,
and presents the later error when it is -EFBIG

However crypto generic code 'swallows' the errno and replaces it
with -EIO.

As an attempt to make it better, we can make luks driver,
detect -EFBIG and in this case present a better error message,
which is what this patch does

The new error message is:

qemu-img: error creating test.luks: The requested file size is too large

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534898
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-05-07 12:52:33 +01:00
Peter Maydell ea1329bb3a Block patches:
- Asynchronous copying for block-copy (i.e., the backup job)
 - Allow resizing of qcow2 images when they have internal snapshots
 - iotests: Logging improvements for Python tests
 - iotest 153 fix, and block comment cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAl6xYpoSHG1yZWl0ekBy
 ZWRoYXQuY29tAAoJEPQH2wBh1c9AF7IH/j1wr6m8OrZtdAoebVqFQn2buydT+kGP
 DP4IVfJ4YoTwmZCBxoR5ZuH6kWTWgUDz1w5x7U5A70tVLqK+RwCnAxrlz19s6rVP
 ACp1d6GO7iLAEH58KRsvSvy7OTvpKzEXP8tS8kDsK58xl8m65vSBLIt9xUpZMql2
 VAiftyu7MYGpObEoy2SnTuhM5H9pPcNiuwATGLNrJqdmi+bW8sIKiV3+yND7s1cz
 lWqVHaWq0XUfBaSG4sx6BqBeBl+cchv0XyRrwNJefY3lVLxIiizwxmb7UYBRSBrY
 XIKGOl1qZNE0AETuF2wh7XzTJuHe7Asash5dIIip0sqtbiToVlSxQwY=
 =vtg/
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2020-05-05' into staging

Block patches:
- Asynchronous copying for block-copy (i.e., the backup job)
- Allow resizing of qcow2 images when they have internal snapshots
- iotests: Logging improvements for Python tests
- iotest 153 fix, and block comment cleanups

# gpg: Signature made Tue 05 May 2020 13:56:58 BST
# gpg:                using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40
# gpg:                issuer "mreitz@redhat.com"
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full]
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* remotes/maxreitz/tags/pull-block-2020-05-05: (24 commits)
  block/block-copy: use aio-task-pool API
  block/block-copy: refactor task creation
  block/block-copy: add state pointer to BlockCopyTask
  block/block-copy: alloc task on each iteration
  block/block-copy: rename in-flight requests to tasks
  Fix iotest 153
  block: Comment cleanups
  qcow2: Tweak comment about bitmaps vs. resize
  qcow2: Allow resize of images with internal snapshots
  block: Add blk_new_with_bs() helper
  iotests: use python logging for iotests.log()
  iotests: Mark verify functions as private
  iotest 258: use script_main
  iotests: add script_initialize
  iotests: add hmp helper with logging
  iotests: limit line length to 79 chars
  iotests: touch up log function signature
  iotests: drop pre-Python 3.4 compatibility code
  iotests: alphabetize standard imports
  iotests: add pylintrc file
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-05-05 16:46:37 +01:00
Peter Maydell f19d118bed nbd patches for 2020-05-04
- reduce client-side fragmentation of NBD trim and status requests
 - fix iotest 41 when run in deep tree
 - fix socket activation in qemu-nbd
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl6whTUACgkQp6FrSiUn
 Q2r4nAf7BtGSFMkUu6nWYeq+Ggg+Xwmz2FLAzWTK/rccGDC44c9ETzOIbWEddo6X
 FHpU07VXdLW1h2M7ox8lQVo0DZEFxTRBYTPtUtjB7izfkAs4CkYeElJsZAPAZKgU
 GsKqa3RM6uXubsQaXXXjMFCGlYgqi1dVkmkgtPebt7evSe0ATlTfYfd0y9gb5f9C
 cbHD3CVcGKQe4ZtNcSBpTzOvXJSrBZznyCyhBO2qmVXTynt/5Ygog+Ulq3DHZsPX
 UkRkTPohKA0BhXuS7wD49danlzCLiTlvswr62fAncM1+AJTbmIa+apy3SwiOkwMh
 Aawq5vDtaFV+HEBKbMC0QRhgtoEe1w==
 =ExlI
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2020-05-04' into staging

nbd patches for 2020-05-04

- reduce client-side fragmentation of NBD trim and status requests
- fix iotest 41 when run in deep tree
- fix socket activation in qemu-nbd

# gpg: Signature made Mon 04 May 2020 22:12:21 BST
# gpg:                using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg:                 aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2020-05-04:
  block/nbd-client: drop max_block restriction from discard
  block/nbd-client: drop max_block restriction from block_status
  iotests/041: Fix NBD socket path
  tools: Fix use of fcntl(F_SETFD) during socket activation

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-05-05 15:47:44 +01:00
Peter Maydell a2261b2754 trivial patches (20200504)
Silent static analyzer warning
 Remove dead assignments
 Support -chardev serial on macOS
 Update MAINTAINERS
 Some cosmetic changes
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAl6wOI4SHGxhdXJlbnRA
 dml2aWVyLmV1AAoJEPMMOL0/L748p7UQAIFSNN0FrDV+K7i8qqq0X+JrS+dNOHNm
 DSpOf8IaGm/BezzL6XirXBVpFxg9iB5DQVLsjP1kUggO7rbBO0blx5H5eOPhnXZj
 xg60kLN16ty7NZ/WPS1G9jF4nDsjz0ZUtCXb0OXsuGJIOrsmN2r/lxdJwcjHZaqJ
 RzbcCSFXlvL0g7mOakJinMJH5r/nWCiUoEYsikhP10DcvuSBoCnjr+LYV6Ef02G0
 Y5lgKN2G0EAMgWTJaL3gIF27zS8QLDNll+eO+PIU5K4yo75/wRCKr4e3PpErZlf6
 B+hCAAPnXCpDKw+8sK2z+9OZXUGe1hQ8LHNgNNM921C66f+vLLXpIDTAECihM4K4
 0wThYlFDwT4j+PMHFNlzIobGMtb33ui8m40lepMt/YOVFqY4tr8u3MLhHkVDo2+8
 sNuOOWLXAoFOYyRqgTeVJvZvMUFQqtDiftghw1BR55TyIpDWjvLYRqae5CI+MGXs
 6YylZVHGzVjMVptxvivvIQ735Nq8LaKq7N8Cb7uvcbRaCki39BsxXVPZx4p6NdwN
 dMndUOz/y75dNlRMDjK8l/oRFPJa/p1Yz8mZhl0uVOO6JeJhBwYmk+WkQ7g/GHZb
 Rx15HnVWRu6C/Icbw4kqZYyqrgl5lykS8aAWURePdpjzKY77rY1H71FesMhjifRN
 ZGgfUdWI88M4
 =ibgH
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/vivier2/tags/trivial-branch-for-5.1-pull-request' into staging

trivial patches (20200504)

Silent static analyzer warning
Remove dead assignments
Support -chardev serial on macOS
Update MAINTAINERS
Some cosmetic changes

# gpg: Signature made Mon 04 May 2020 16:45:18 BST
# gpg:                using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg:                issuer "laurent@vivier.eu"
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full]
# gpg:                 aka "Laurent Vivier <laurent@vivier.eu>" [full]
# gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier2/tags/trivial-branch-for-5.1-pull-request:
  hw/timer/pxa2xx_timer: Add assertion to silent static analyzer warning
  hw/timer/stm32f2xx_timer: Remove dead assignment
  hw/gpio/aspeed_gpio: Remove dead assignment
  hw/isa/i82378: Remove dead assignment
  hw/ide/sii3112: Remove dead assignment
  hw/input/adb-kbd: Remove dead assignment
  hw/i2c/pm_smbus: Remove dead assignment
  blockdev: Remove dead assignment
  block: Avoid dead assignment
  Compress lines for immediate return
  chardev: Add macOS to list of OSes that support -chardev serial
  MAINTAINERS: Update Keith Busch's email address
  elf_ops: Don't try to g_mapped_file_unref(NULL)
  hw/mem/pc-dimm: Fix line over 80 characters warning
  hw/mem/pc-dimm: Print slot number on error at pc_dimm_pre_plug()
  MAINTAINERS: Mark the LatticeMico32 target as orphan
  timer/exynos4210_mct: Remove redundant statement in exynos4210_mct_write()
  display/blizzard: use extract16() for fix clang analyzer warning in blizzard_draw_line16_32()
  scsi/esp-pci: add g_assert() for fix clang analyzer warning in esp_pci_io_write()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-05-05 14:03:28 +01:00
Vladimir Sementsov-Ogievskiy 4ce5dd3e9b block/block-copy: use aio-task-pool API
Run block_copy iterations in parallel in aio tasks.

Changes:
  - BlockCopyTask becomes aio task structure. Add zeroes field to pass
    it to block_copy_do_copy
  - add call state - it's a state of one call of block_copy(), shared
    between parallel tasks. For now used only to keep information about
    first error: is it read or not.
  - convert block_copy_dirty_clusters to aio-task loop.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200429130847.28124-6-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-05-05 14:03:28 +02:00
Vladimir Sementsov-Ogievskiy 42ac214406 block/block-copy: refactor task creation
Instead of just relying on the comment "Called only on full-dirty
region" in block_copy_task_create() let's move initial dirty area
search directly to block_copy_task_create(). Let's also use effective
bdrv_dirty_bitmap_next_dirty_area instead of looping through all
non-dirty clusters.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200429130847.28124-5-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-05-05 14:03:28 +02:00
Vladimir Sementsov-Ogievskiy 1348a65774 block/block-copy: add state pointer to BlockCopyTask
We are going to use aio-task-pool API, so we'll need state pointer in
BlockCopyTask anyway. Add it now and use where possible.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200429130847.28124-4-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-05-05 14:03:28 +02:00
Vladimir Sementsov-Ogievskiy f13e60a973 block/block-copy: alloc task on each iteration
We are going to use aio-task-pool API, so tasks will be handled in
parallel. We need therefore separate allocated task on each iteration.
Introduce this logic now.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200429130847.28124-3-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-05-05 14:03:28 +02:00
Vladimir Sementsov-Ogievskiy e9407785cc block/block-copy: rename in-flight requests to tasks
We are going to use aio-task-pool API and extend in-flight request
structure to be a successor of AioTask, so rename things appropriately.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200429130847.28124-2-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-05-05 14:03:28 +02:00
Eric Blake f464906951 block: Comment cleanups
It's been a while since we got rid of the sector-based bdrv_read and
bdrv_write (commit 2e11d756); let's finish the job on a few remaining
comments.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200428213807.776655-1-eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-05-05 13:17:36 +02:00
Eric Blake ee1244a2e9 qcow2: Tweak comment about bitmaps vs. resize
Our comment did not actually match the code.  Rewrite the comment to
be less sensitive to any future changes to qcow2-bitmap.c that might
implement scenarios that we currently reject.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200428192648.749066-4-eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-05-05 13:17:36 +02:00
Eric Blake 7fa140abf6 qcow2: Allow resize of images with internal snapshots
We originally refused to allow resize of images with internal
snapshots because the v2 image format did not require the tracking of
snapshot size, making it impossible to safely revert to a snapshot
with a different size than the current view of the image.  But the
snapshot size tracking was rectified in v3, and our recent fixes to
qemu-img amend (see 0a85af35) guarantee that we always have a valid
snapshot size.  Thus, we no longer need to artificially limit image
resizes, but it does become one more thing that would prevent a
downgrade back to v2.  And now that we support different-sized
snapshots, it's also easy to fix reverting to a snapshot to apply the
new size.

Upgrade iotest 61 to cover this (we previously had NO coverage of
refusal to resize while snapshots exist).  Note that the amend process
can fail but still have effects: in particular, since we break things
into upgrade, resize, downgrade, a failure during resize does not roll
back changes made during upgrade, nor does failure in downgrade roll
back a resize.  But this situation is pre-existing even without this
patch; and without journaling, the best we could do is minimize the
chance of partial failure by collecting all changes prior to doing any
writes - which adds a lot of complexity but could still fail with EIO.
On the other hand, we are careful that even if we have partial
modification but then fail, the image is left viable (that is, we are
careful to sequence things so that after each successful cluster
write, there may be transient leaked clusters but no corrupt
metadata).  And complicating the code to make it more transaction-like
is not worth the effort: a user can always request multiple 'qemu-img
amend' changing one thing each, if they need finer-grained control
over detecting the first failure than what they get by letting qemu
decide how to sequence multiple changes.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200428192648.749066-3-eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-05-05 13:17:36 +02:00
Eric Blake a3aeeab557 block: Add blk_new_with_bs() helper
There are several callers that need to create a new block backend from
an existing BDS; make the task slightly easier with a common helper
routine.

Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200424190903.522087-2-eblake@redhat.com>
[mreitz: Set @ret only in error paths, see
 https://lists.nongnu.org/archive/html/qemu-block/2020-04/msg01216.html]
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200428192648.749066-2-eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-05-05 13:17:36 +02:00
Vladimir Sementsov-Ogievskiy 714eb0dbc5 block/nbd-client: drop max_block restriction from discard
The NBD spec was updated (see nbd.git commit 9f30fedb) so that
max_block doesn't relate to NBD_CMD_TRIM. So, drop the restriction.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200401150112.9557-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: tweak commit message to call out NBD commit]
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-05-04 15:16:46 -05:00
Vladimir Sementsov-Ogievskiy 6bf792b464 block/nbd-client: drop max_block restriction from block_status
The NBD spec was updated (see nbd.git commit 9f30fedb) so that
max_block doesn't relate to NBD_CMD_BLOCK_STATUS. So, drop the
restriction.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200401150112.9557-2-vsementsov@virtuozzo.com>
[eblake: tweak commit message to call out NBD commit]
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-05-04 15:13:14 -05:00
Daniel Brodsky 6e8a355de6 lockable: replaced locks with lock guard macros where appropriate
- ran regexp "qemu_mutex_lock\(.*\).*\n.*if" to find targets
- replaced result with QEMU_LOCK_GUARD if all unlocks at function end
- replaced result with WITH_QEMU_LOCK_GUARD if unlock not at end

Signed-off-by: Daniel Brodsky <dnbrdsky@gmail.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-id: 20200404042108.389635-3-dnbrdsky@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-05-04 16:07:43 +01:00
Simran Singhal b3ac2b94cd Compress lines for immediate return
Compress two lines into a single line if immediate return statement is found.

It also remove variables progress, val, data, ret and sock
as they are no longer needed.

Remove space between function "mixer_load" and '(' to fix the
checkpatch.pl error:-
ERROR: space prohibited between function name and open parenthesis '('

Done using following coccinelle script:
@@
local idexpression ret;
expression e;
@@

-ret =
+return
     e;
-return ret;

Signed-off-by: Simran Singhal <singhalsimran0@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200401165314.GA3213@simran-Inspiron-5558>
[lv: in handle_aiocb_write_zeroes_unmap() move "int ret" inside the #ifdef]
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-05-04 14:43:22 +02:00
Kevin Wolf eb8a0cf3ba qcow2: Forward ZERO_WRITE flag for full preallocation
The BDRV_REQ_ZERO_WRITE is currently implemented in a way that first the
image is possibly preallocated and then the zero flag is added to all
clusters. This means that a copy-on-write operation may be needed when
writing to these clusters, despite having used preallocation, negating
one of the major benefits of preallocation.

Instead, try to forward the BDRV_REQ_ZERO_WRITE to the protocol driver,
and if the protocol driver can ensure that the new area reads as zeros,
we can skip setting the zero flag in the qcow2 layer.

Unfortunately, the same approach doesn't work for metadata
preallocation, so we'll still set the zero flag there.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200424142701.67053-1-kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-30 17:51:07 +02:00
Kevin Wolf 955c7d6687 block: truncate: Don't make backing file data visible
When extending the size of an image that has a backing file larger than
its old size, make sure that the backing file data doesn't become
visible in the guest, but the added area is properly zeroed out.

Consider the following scenario where the overlay is shorter than its
backing file:

    base.qcow2:     AAAAAAAA
    overlay.qcow2:  BBBB

When resizing (extending) overlay.qcow2, the new blocks should not stay
unallocated and make the additional As from base.qcow2 visible like
before this patch, but zeros should be read.

A similar case happens with the various variants of a commit job when an
intermediate file is short (- for unallocated):

    base.qcow2:     A-A-AAAA
    mid.qcow2:      BB-B
    top.qcow2:      C--C--C-

After commit top.qcow2 to mid.qcow2, the following happens:

    mid.qcow2:      CB-C00C0 (correct result)
    mid.qcow2:      CB-C--C- (before this fix)

Without the fix, blocks that previously read as zeros on top.qcow2
suddenly turn into A.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200424125448.63318-8-kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-30 17:51:07 +02:00
Kevin Wolf 2f0c6e7a65 file-posix: Support BDRV_REQ_ZERO_WRITE for truncate
For regular files, we always get BDRV_REQ_ZERO_WRITE behaviour from the
OS, so we can advertise the flag and just ignore it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200424125448.63318-7-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-30 17:51:07 +02:00
Kevin Wolf 1ddaabaecb raw-format: Support BDRV_REQ_ZERO_WRITE for truncate
The raw format driver can simply forward the flag and let its bs->file
child take care of actually providing the zeros.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200424125448.63318-6-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-30 17:51:07 +02:00
Kevin Wolf f01643fb8b qcow2: Support BDRV_REQ_ZERO_WRITE for truncate
If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
undo any previous preallocation, but just adds the zero flag to all
relevant L2 entries. If an external data file is in use, a write_zeroes
request to the data file is made instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200424125448.63318-5-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-30 17:51:07 +02:00
Kevin Wolf 8c6242b6f3 block-backend: Add flags to blk_truncate()
Now that node level interface bdrv_truncate() supports passing request
flags to the block driver, expose this on the BlockBackend level, too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200424125448.63318-4-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-30 17:51:07 +02:00
Kevin Wolf 7b8e485742 block: Add flags to bdrv(_co)_truncate()
Now that block drivers can support flags for .bdrv_co_truncate, expose
the parameter in the node level interfaces bdrv_co_truncate() and
bdrv_truncate().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200424125448.63318-3-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-30 17:51:07 +02:00
Kevin Wolf 92b92799dc block: Add flags to BlockDriver.bdrv_co_truncate()
This adds a new BdrvRequestFlags parameter to the .bdrv_co_truncate()
driver callbacks, and a supported_truncate_flags field in
BlockDriverState that allows drivers to advertise support for request
flags in the context of truncate.

For now, we always pass 0 and no drivers declare support for any flag.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200424125448.63318-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-30 17:51:07 +02:00
Markus Armbruster 1f5842487a qapi: Only input visitors can actually fail
The previous few commits have made this more obvious, and removed the
one exception.  Time to clarify the documentation, and drop dead error
checking.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200424084338.26803-13-armbru@redhat.com>
2020-04-30 07:26:40 +02:00
Markus Armbruster 77ed971b9d block/file-posix: Fix check_cache_dropped() error handling
The Error ** argument must be NULL, &error_abort, &error_fatal, or a
pointer to a variable containing NULL.  Passing an argument of the
latter kind twice without clearing it in between is wrong: if the
first call sets an error, it no longer points to NULL for the second
call.

check_cache_dropped() calls error_setg() in a loop.  It fails to break
the loop in one instance.  If a subsequent iteration error_setg()s
again, it trips error_setv()'s assertion.

Fix it to break the loop.

Fixes: 31be8a2a97
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200422130719.28225-3-armbru@redhat.com>
2020-04-29 08:01:52 +02:00
Philippe Mathieu-Daudé 78ee6bd048 various: Remove suspicious '\' character outside of #define in C code
Fixes the following coccinelle warnings:

  $ spatch --sp-file --verbose-parsing  ... \
      scripts/coccinelle/remove_local_err.cocci
  ...
  SUSPICIOUS: a \ character appears outside of a #define at ./target/ppc/translate_init.inc.c:5213
  SUSPICIOUS: a \ character appears outside of a #define at ./target/ppc/translate_init.inc.c:5261
  SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:166
  SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:167
  SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:169
  SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:170
  SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:171
  SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:172
  SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:173
  SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5787
  SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5789
  SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5800
  SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5801
  SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5802
  SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5804
  SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5805
  SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5806
  SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:6329
  SUSPICIOUS: a \ character appears outside of a #define at ./hw/sd/sdhci.c:1133
  SUSPICIOUS: a \ character appears outside of a #define at ./hw/scsi/scsi-disk.c:3081
  SUSPICIOUS: a \ character appears outside of a #define at ./hw/net/virtio-net.c:1529
  SUSPICIOUS: a \ character appears outside of a #define at ./hw/riscv/sifive_u.c:468
  SUSPICIOUS: a \ character appears outside of a #define at ./dump/dump.c:1895
  SUSPICIOUS: a \ character appears outside of a #define at ./block/vhdx.c:2209
  SUSPICIOUS: a \ character appears outside of a #define at ./block/vhdx.c:2215
  SUSPICIOUS: a \ character appears outside of a #define at ./block/vhdx.c:2221
  SUSPICIOUS: a \ character appears outside of a #define at ./block/vhdx.c:2222
  SUSPICIOUS: a \ character appears outside of a #define at ./block/replication.c:172
  SUSPICIOUS: a \ character appears outside of a #define at ./block/replication.c:173

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20200412223619.11284-2-f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2020-04-29 08:01:51 +02:00
Chen Qun ff0507c239 block/iscsi:fix heap-buffer-overflow in iscsi_aio_ioctl_cb
There is an overflow, the source 'datain.data[2]' is 100 bytes,
 but the 'ss' is 252 bytes.This may cause a security issue because
 we can access a lot of unrelated memory data.

The len for sbp copy data should take the minimum of mx_sb_len and
 sb_len_wr, not the maximum.

If we use iscsi device for VM backend storage, ASAN show stack:

READ of size 252 at 0xfffd149dcfc4 thread T0
    #0 0xaaad433d0d34 in __asan_memcpy (aarch64-softmmu/qemu-system-aarch64+0x2cb0d34)
    #1 0xaaad45f9d6d0 in iscsi_aio_ioctl_cb /qemu/block/iscsi.c:996:9
    #2 0xfffd1af0e2dc  (/usr/lib64/iscsi/libiscsi.so.8+0xe2dc)
    #3 0xfffd1af0d174  (/usr/lib64/iscsi/libiscsi.so.8+0xd174)
    #4 0xfffd1af19fac  (/usr/lib64/iscsi/libiscsi.so.8+0x19fac)
    #5 0xaaad45f9acc8 in iscsi_process_read /qemu/block/iscsi.c:403:5
    #6 0xaaad4623733c in aio_dispatch_handler /qemu/util/aio-posix.c:467:9
    #7 0xaaad4622f350 in aio_dispatch_handlers /qemu/util/aio-posix.c:510:20
    #8 0xaaad4622f350 in aio_dispatch /qemu/util/aio-posix.c:520
    #9 0xaaad46215944 in aio_ctx_dispatch /qemu/util/async.c:298:5
    #10 0xfffd1bed12f4 in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x512f4)
    #11 0xaaad46227de0 in glib_pollfds_poll /qemu/util/main-loop.c:219:9
    #12 0xaaad46227de0 in os_host_main_loop_wait /qemu/util/main-loop.c:242
    #13 0xaaad46227de0 in main_loop_wait /qemu/util/main-loop.c:518
    #14 0xaaad43d9d60c in qemu_main_loop /qemu/softmmu/vl.c:1662:9
    #15 0xaaad4607a5b0 in main /qemu/softmmu/main.c:49:5
    #16 0xfffd1a460b9c in __libc_start_main (/lib64/libc.so.6+0x20b9c)
    #17 0xaaad43320740 in _start (aarch64-softmmu/qemu-system-aarch64+0x2c00740)

0xfffd149dcfc4 is located 0 bytes to the right of 100-byte region [0xfffd149dcf60,0xfffd149dcfc4)
allocated by thread T0 here:
    #0 0xaaad433d1e70 in __interceptor_malloc (aarch64-softmmu/qemu-system-aarch64+0x2cb1e70)
    #1 0xfffd1af0e254  (/usr/lib64/iscsi/libiscsi.so.8+0xe254)
    #2 0xfffd1af0d174  (/usr/lib64/iscsi/libiscsi.so.8+0xd174)
    #3 0xfffd1af19fac  (/usr/lib64/iscsi/libiscsi.so.8+0x19fac)
    #4 0xaaad45f9acc8 in iscsi_process_read /qemu/block/iscsi.c:403:5
    #5 0xaaad4623733c in aio_dispatch_handler /qemu/util/aio-posix.c:467:9
    #6 0xaaad4622f350 in aio_dispatch_handlers /qemu/util/aio-posix.c:510:20
    #7 0xaaad4622f350 in aio_dispatch /qemu/util/aio-posix.c:520
    #8 0xaaad46215944 in aio_ctx_dispatch /qemu/util/async.c:298:5
    #9 0xfffd1bed12f4 in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x512f4)
    #10 0xaaad46227de0 in glib_pollfds_poll /qemu/util/main-loop.c:219:9
    #11 0xaaad46227de0 in os_host_main_loop_wait /qemu/util/main-loop.c:242
    #12 0xaaad46227de0 in main_loop_wait /qemu/util/main-loop.c:518
    #13 0xaaad43d9d60c in qemu_main_loop /qemu/softmmu/vl.c:1662:9
    #14 0xaaad4607a5b0 in main /qemu/softmmu/main.c:49:5
    #15 0xfffd1a460b9c in __libc_start_main (/lib64/libc.so.6+0x20b9c)
    #16 0xaaad43320740 in _start (aarch64-softmmu/qemu-system-aarch64+0x2c00740)

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200418062602.10776-1-kuhn.chenqun@huawei.com
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-04-20 11:31:46 +01:00
Peter Maydell 2f37b0222c Block layer patches:
- Fix crashes and hangs related to iothreads, bdrv_drain and block jobs:
     - Fix some AIO context locking in jobs
     - Fix blk->in_flight during blk_wait_while_drained()
 - vpc: Don't round up already aligned BAT sizes
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJejI1UAAoJEH8JsnLIjy/WQhEQAIF2zkkdbT77VTMLvFmPU36J
 hPR2RSzd5egNtfn3zm7vGZJ3FEZb5SEAV5R9CnwVAJeQnPUll7bjkPsoynrEUFU2
 PSoFe8lKuYwoAOOSjqakASpKx/Xs3sctKc4fgRWr5/dbWFCC77Q+qxHiKO4KYfeh
 g00jsjeCLMVLop3r9uMDovlA81mJUIP5axSjSH7akgzLC+ZCfH2RFFu62D7wYv9w
 UgQM/NZt2YuLFm+BeQLB4K2BK+PZBCN1trPj+h11ecUwm9b1XZZfO/7R0T8Wrljc
 49fd5Z/GombCnyxuLCr6QrhLZcr8FffLJXQgkdHTkWUKQXqZUfmqLjVIAbli0ZiC
 hP8jE5EgdAXpBrGeM2oH+dk0iQSBGTIMaGlhDwxO32xOAQ8TKpRiMZMfwGHqB+vn
 m/EPoHLHL6vQGxISfGjj4k3fnSP74nvRrS656MhLuG03SkPZacyTZQpTkErg2imW
 AeU6g4afvvLtJBoF/YYA5Qhff1Ux5eh9jactIW1DRf/Q4tTc4ioTU25560Le4eGZ
 kex/AwIcf9P47eTUCP8L6iNKz8RU7bV4g9vl9zz7fQm3i9GEhly88XvOppnXRUvT
 XdkfdlSmUZ9vue6rAfgsL5fQIHtsGRfH90nT11/IW1X4baOImtcQBWg3xZdR4zPS
 W2H0J01PlSKE8l/OWQlo
 =oODf
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- Fix crashes and hangs related to iothreads, bdrv_drain and block jobs:
    - Fix some AIO context locking in jobs
    - Fix blk->in_flight during blk_wait_while_drained()
- vpc: Don't round up already aligned BAT sizes

# gpg: Signature made Tue 07 Apr 2020 15:25:24 BST
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  vpc: Don't round up already aligned BAT sizes
  block: Fix blk->in_flight during blk_wait_while_drained()
  block: Increase BB.in_flight for coroutine and sync interfaces
  block-backend: Reorder flush/pdiscard function definitions
  backup: don't acquire aio_context in backup_clean
  replication: assert we own context before job_cancel_sync
  job: take each job's lock individually in job_txn_apply

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-04-07 19:12:45 +01:00
Kevin Wolf 3f6de653b9 vpc: Don't round up already aligned BAT sizes
As reported on Launchpad, Azure apparently doesn't accept images for
upload that are not both aligned to 1 MB blocks and have a BAT size that
matches the image size exactly.

As far as I can tell, there is no real reason why we create a BAT that
is one entry longer than necessary for aligned image sizes, so change
that.

(Even though the condition is only mentioned as "should" in the spec and
previous products accepted larger BATs - but we'll try to maintain
compatibility with as many of Microsoft's ever-changing interpretations
of the VHD spec as possible.)

Fixes: https://bugs.launchpad.net/bugs/1870098
Reported-by: Tobias Witek
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200402093603.2369-1-kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-07 15:42:08 +02:00
Kevin Wolf 7f16476fab block: Fix blk->in_flight during blk_wait_while_drained()
Waiting in blk_wait_while_drained() while blk->in_flight is increased
for the current request is wrong because it will cause the drain
operation to deadlock.

This patch makes sure that blk_wait_while_drained() is called with
blk->in_flight increased exactly once for the current request, and that
it temporarily decreases the counter while it waits.

Fixes: cf3129323f
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200407121259.21350-4-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-07 15:40:57 +02:00
Kevin Wolf fbb92b6798 block: Increase BB.in_flight for coroutine and sync interfaces
External callers of blk_co_*() and of the synchronous blk_*() functions
don't currently increase the BlockBackend.in_flight counter, but calls
from blk_aio_*() do, so there is an inconsistency whether the counter
has been increased or not.

This patch moves the actual operations to static functions that can
later know they will always be called with in_flight increased exactly
once, even for external callers using the blk_co_*() coroutine
interfaces.

If the public blk_co_*() interface is unused, remove it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200407121259.21350-3-kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-07 15:40:41 +02:00
Kevin Wolf 564806c529 block-backend: Reorder flush/pdiscard function definitions
Move all variants of the flush/pdiscard functions to a single place and
put the blk_co_*() version first because it is called by all other
variants (and will become static in the next patch).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200407121259.21350-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-07 15:40:28 +02:00
Stefan Reiter eca0f3524a backup: don't acquire aio_context in backup_clean
All code-paths leading to backup_clean (via job_clean) have the job's
context already acquired. The job's context is guaranteed to be the same
as the one used by backup_top via backup_job_create.

Since the previous logic effectively acquired the lock twice, this
broke cleanup of backups for disks using IO threads, since the BDRV_POLL_WHILE
in bdrv_backup_top_drop -> bdrv_do_drained_begin would only release the lock
once, thus deadlocking with the IO thread.

This is a partial revert of 0abf258171.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200407115651.69472-4-s.reiter@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-07 14:34:47 +02:00
Stefan Reiter 08558e3325 replication: assert we own context before job_cancel_sync
job_cancel_sync requires the job's lock to be held, all other callers
already do this (replication_stop, drive_backup_abort,
blockdev_backup_abort, job_cancel_sync_all, cancel_common).

In this case we're in a BlockDriver handler, so we already have a lock,
just assert that it is the same as the one used for the commit_job.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Message-Id: <20200407115651.69472-3-s.reiter@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-07 14:34:47 +02:00
Alberto Garcia fb43d2d46e qcow2: Check request size in qcow2_co_pwritev_compressed_part()
When issuing a compressed write request the number of bytes must be a
multiple of the cluster size or reach the end of the last cluster.

With the current code such requests are allowed and we hit an
assertion:

   $ qemu-img create -f qcow2 img.qcow2 1M
   $ qemu-io -c 'write -c 0 32k' img.qcow2

   qemu-io: block/qcow2.c:4257: qcow2_co_pwritev_compressed_task:
   Assertion `bytes == s->cluster_size || (bytes < s->cluster_size &&
              (offset + bytes == bs->total_sectors << BDRV_SECTOR_BITS))' failed.
   Aborted

This patch fixes a regression introduced in 0d483dce38

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200406143401.26854-1-berto@igalia.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-04-07 13:51:09 +02:00
Alberto Garcia 80f5c01183 qcow2: Forbid discard in qcow2 v2 images with backing files
A discard request deallocates the selected clusters so they read back
as zeroes. This is done by clearing the cluster offset field and
setting QCOW_OFLAG_ZERO in the L2 entry.

This flag is however only supported when qcow_version >= 3. In older
images the cluster is simply deallocated, exposing any possible stale
data from the backing file.

Since discard is an advisory operation it's safer to simply forbid it
in this scenario.

Note that we are adding this check to qcow2_co_pdiscard() and not to
qcow2_cluster_discard() or discard_in_l2_slice() because the last
two are also used by qcow2_snapshot_create() to discard the clusters
used by the VM state. In this case there's no risk of exposing stale
data to the guest and we really want that the clusters are always
discarded.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200331114345.29993-1-berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-04-07 13:51:09 +02:00
Kevin Wolf df74b1d3df qcow2: Remove unused fields from BDRVQcow2State
These fields were already removed in commit c3c10f72, but then commit
b58deb34 revived them probably due to bad merge conflict resolution.
They are still unused, so remove them again.

Fixes: b58deb344d
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200326170757.12344-1-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-27 14:47:23 +01:00
Kevin Wolf ce8cabbd17 mirror: Wait only for in-flight operations
mirror_wait_for_free_in_flight_slot() just picks a random operation to
wait for. However, a MirrorOp is already in s->ops_in_flight when
mirror_co_read() waits for free slots, so if not enough slots are
immediately available, an operation can end up waiting for itself, or
two or more operations can wait for each other to complete, which
results in a hang.

Fix this by adding a flag to MirrorOp that tells us if the request is
already in flight (and therefore occupies slots that it will later
free), and picking only such operations for waiting.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1794692
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200326153628.4869-3-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-27 14:47:23 +01:00
Kevin Wolf 9178f4fe5f Revert "mirror: Don't let an operation wait for itself"
This reverts commit 7e6c4ff792.

The fix was incomplete as it only protected against requests waiting for
themselves, but not against requests waiting for each other. We need a
different solution.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200326153628.4869-2-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-27 14:47:23 +01:00
Chen Qun 34afc5c298 block/iscsi:use the flags in iscsi_open() prevent Clang warning
Clang static code analyzer show warning:
  block/iscsi.c:1920:9: warning: Value stored to 'flags' is never read
        flags &= ~BDRV_O_RDWR;
        ^        ~~~~~~~~~~~~

In iscsi_allocmap_init() only checks BDRV_O_NOCACHE, which
is the same in both of flags and bs->open_flags.
We can use the flags instead bs->open_flags to prevent Clang warning.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200311032927.35092-1-kuhn.chenqun@huawei.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-27 14:47:23 +01:00
Eric Blake ed04991063 sheepdog: Consistently set bdrv_has_zero_init_truncate
block_int.h claims that .bdrv_has_zero_init must return 0 if
.bdrv_has_zero_init_truncate does likewise; but this is violated if
only the former callback is provided if .bdrv_co_truncate also exists.
When adding the latter callback, it was mistakenly added to only one
of the three possible sheepdog instantiations.

Fixes: 1dcaf527
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200324174233.1622067-5-eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-26 14:44:33 +01:00
Eric Blake e7be13ad3f qcow2: Avoid feature name extension on small cluster size
As the feature name table can be quite large (over 9k if all 64 bits
of all three feature fields have names; a mere 8 features leaves only
8 bytes for a backing file name in a 512-byte cluster), it is unwise
to emit this optional header in images with small cluster sizes.

Update iotest 036 to skip running on small cluster sizes; meanwhile,
note that iotest 061 never passed on alternative cluster sizes
(however, I limited this patch to tests with output affected by adding
feature names, rather than auditing for other tests that are not
robust to alternative cluster sizes).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200324174233.1622067-4-eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-26 14:44:33 +01:00
Eric Blake bb40ebce2c qcow2: List autoclear bit names in header
The feature table is supposed to advertise the name of all feature
bits that we support; however, we forgot to update the table for
autoclear bits.  While at it, move the table to read-only memory in
code, and tweak the qcow2 spec to name the second autoclear bit.
Update iotests that are affected by the longer header length.

Fixes: 88ddffae
Fixes: 93c24936
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200324174233.1622067-3-eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-26 14:44:33 +01:00
Eric Blake a951a631b9 qcow2: Comment typo fixes
Various trivial typos noticed while working on this file.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200324174233.1622067-2-eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-26 14:44:33 +01:00
Maxim Levitsky 5a5e7f8cd8 block: trickle down the fallback image creation function use to the block drivers
Instead of checking the .bdrv_co_create_opts to see if we need the
fallback, just implement the .bdrv_co_create_opts in the drivers that
need it.

This way we don't break various places that need to know if the
underlying protocol/format really supports image creation, and this way
we still allow some drivers to not support image creation.

Fixes: fd17146cd9
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1816007

Note that technically this driver reverts the image creation fallback
for the vxhs driver since I don't have a means to test it, and IMHO it
is better to leave it not supported as it was prior to generic image
creation patches.

Also drop iscsi_create_opts which was left accidentally.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200326011218.29230-3-mlevitsk@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
[mreitz: Fixed alignment, and moved bdrv_co_create_opts_simple() and
         bdrv_create_opts_simple from block.h into block_int.h]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-26 14:44:33 +01:00
Maxim Levitsky b92902dfea block: pass BlockDriver reference to the .bdrv_co_create
This will allow the reuse of a single generic .bdrv_co_create
implementation for several drivers.
No functional changes.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200326011218.29230-2-mlevitsk@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-26 14:44:33 +01:00
Vladimir Sementsov-Ogievskiy 66c8672d24 block/mirror: fix use after free of local_err
local_err is used again in mirror_exit_common() after
bdrv_set_backing_hd(), so we must zero it. Otherwise try to set
non-NULL local_err will crash.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200324153630.11882-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-26 14:44:32 +01:00
Vladimir Sementsov-Ogievskiy 808cf3cb6a block/qcow2: zero data_file child after free
data_file being NULL doesn't seem to be a correct state, but it's
better than dead pointer and simpler to debug.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200316060631.30052-3-vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-24 11:41:46 +01:00
Eric Blake 71eaec2e8c block: Avoid memleak on qcow2 image info failure
If we fail to get bitmap info, we must not leak the encryption info.

Fixes: b8968c875f
Fixes: Coverity CID 1421894
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200320183620.1112123-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Tested-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-24 11:41:46 +01:00
Peter Maydell e6d567db23 Pull request
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+ber27ys35W+dsvQfe+BBqr8OQ4FAl5yg0AACgkQfe+BBqr8
 OQ7uWBAArnxhM0kej5NAQiGsC3sHDOGUszKhsz8qwSOYQ0NkOH9495ksklOuw8T9
 v/7CAp4rqvma00nL5D+6Srjc9Pe+9BdIz3aa0xvi+tkXvRvFKcWoZlLyNpK+SCbB
 FwT5sAzoZ004cijG3xsV8U/1i/jEWiRk0AV8USuGymTkQh0M9XS/MHVlrv0HJU9k
 2tLz7FGHBmEGhXFXKjviNsrQfvHl2AzIKIYnOcK6OhF6wevJzRz8ShsYcmhyN7V3
 rBm0IDiJxJ54a6oZ8FyukkqL5gj+mZDEoGXnspvWkmixqjmNw0iAWgyWNk8vnBue
 ROphtzZ7xvATu560vIXnlj9q+new5h1ZEb2Q1Nl6WL0PFHJ0vEmRHnpSj+lUTHFD
 aEwLL72qk12yZFPwEO9V+tO15mE+9vYbbsfe8SXVIlH52ec+OHGQ3/ytulL2zEDz
 vVJYy37wOhp8f5TFpdmFXi8ovHV7a8g3Vr6Hh1XmcEW0o/GjEgunjW7V+JHSJlDe
 XBp5IXsz0+zWn2hCxtjUzX3qBDIswyR89qR/NRJKpuu1D5VSaqVwM3vd5dCgDLHA
 lnCCi8kcq9rsnRNyeBb0t8d7c6e1HSW8OIHL4OC515VUFdrKSEyxprzDyIMPifDg
 9B9ImLXg3CgOQVOXPU/0mbh4beTIeI7uJrmgzvzs3vZc+dLG+H8=
 =xYlZ
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging

Pull request

# gpg: Signature made Wed 18 Mar 2020 20:23:28 GMT
# gpg:                using RSA key F9B7ABDBBCACDF95BE76CBD07DEF8106AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" [full]
# Primary key fingerprint: FAEB 9711 A12C F475 812F  18F2 88A9 064D 1835 61EB
#      Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76  CBD0 7DEF 8106 AAFC 390E

* remotes/jnsnow/tags/bitmaps-pull-request:
  block/qcow2-bitmap: use bdrv_dirty_bitmap_next_dirty
  nbd/server: use bdrv_dirty_bitmap_next_dirty_area
  nbd/server: introduce NBDExtentArray
  block/dirty-bitmap: improve _next_dirty_area API
  block/dirty-bitmap: add _next_dirty API
  block/dirty-bitmap: switch _next_dirty_area and _next_zero to int64_t
  hbitmap: drop meta bitmaps as they are unused
  hbitmap: unpublish hbitmap_iter_skip_words
  hbitmap: move hbitmap_iter_next_word to hbitmap.c
  hbitmap: assert that we don't create bitmap larger than INT64_MAX
  build: Silence clang warning on older glib autoptr usage

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-03-19 15:31:09 +00:00
Vladimir Sementsov-Ogievskiy 2d00cbd8e2 block/qcow2-bitmap: use bdrv_dirty_bitmap_next_dirty
store_bitmap_data() loop does bdrv_set_dirty_iter() on each iteration,
which means that we actually don't need iterator itself and we can use
simpler bitmap API.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-11-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy 299ea9ff01 block/dirty-bitmap: improve _next_dirty_area API
Firstly, _next_dirty_area is for scenarios when we may contiguously
search for next dirty area inside some limited region, so it is more
comfortable to specify "end" which should not be recalculated on each
iteration.

Secondly, let's add a possibility to limit resulting area size, not
limiting searching area. This will be used in NBD code in further
commit. (Note that now bdrv_dirty_bitmap_next_dirty_area is unused)

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-8-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy 9399c54b75 block/dirty-bitmap: add _next_dirty API
We have bdrv_dirty_bitmap_next_zero, let's add corresponding
bdrv_dirty_bitmap_next_dirty, which is more comfortable to use than
bitmap iterators in some cases.

For test modify test_hbitmap_next_zero_check_range to check both
next_zero and next_dirty and add some new checks.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-7-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy 642700fda0 block/dirty-bitmap: switch _next_dirty_area and _next_zero to int64_t
We are going to introduce bdrv_dirty_bitmap_next_dirty so that same
variable may be used to store its return value and to be its parameter,
so it would int64_t.

Similarly, we are going to refactor hbitmap_next_dirty_area to use
hbitmap_next_dirty together with hbitmap_next_zero, therefore we want
hbitmap_next_zero parameter type to be int64_t too.

So, for convenience update all parameters of *_next_zero and
*_next_dirty_area to be int64_t.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-6-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2020-03-18 14:03:46 -04:00
Peter Maydell cf4b64406c Error reporting patches for 2020-03-17
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl5w+zkSHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZTaeAQALPrnwX3g9/HLm2YHc1P0TB1eTenBqen
 K204sRW53waxzm4g9trb8P4Nzmp8r1oGmZfPriVzB3ykoW2Kzfu+4oa95+YT+exk
 H4XSQfCvCp1e/ZShkx5rY9Kg1gSgWhQ00MNwz8puHUsHtcp5dMTkmYqL4hzgWnA0
 TwV7w06+6kLP4fRglIc5X7BVggBKosmMPfvjg/KYUe12Z3moSSQZA5dyEp5VAVl9
 MNFJpryWVek6+Z8UFiQ3CMmR/H2UVI0liDlU1aZsR9pcyjiuJxrBEwboVO5qY3N7
 lraKg+CVdiK7rn21bs6wAFOk08eG8VqZMeTb7HU6KJ6FIP2KopwvRXIEmNgo2C/C
 xU3XRl5oyRtaAOKSnwOBzEhZZ+wTRp2RcMzFS6p7URm5R3LNfB1dlqE7yE5z4lcl
 EgdbMLy4LiMkKwUPrVGBwzZNDO6ywVjFWUcHze9Dyb3z1ciWhwEENaIGe0CU3lhG
 ii+GxTzMTGoeJ2HE2hRmGTLACNt7a/we88aDY0kDLeVz5rq80oa+xckqV/oG3XpN
 v/imWHMugdsUwmQshUrT0JQq+BCnuwiHc82pm0X8bTqtJ6TmoIYhxuJkh040QIxt
 5ymFfAMz7ysc+50JY7OEVRI/8YQPyCaZmst/D42dicWUU9NdasWcIx+kCmK3LOjj
 0/Nb4vfX3xgN
 =vpk3
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2020-03-17' into staging

Error reporting patches for 2020-03-17

# gpg: Signature made Tue 17 Mar 2020 16:30:49 GMT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-error-2020-03-17:
  hw/sd/ssi-sd: fix error handling in ssi_sd_realize
  xen-block: Use one Error * variable instead of two
  hw/misc/ivshmem: Use one Error * variable instead of two
  Use &error_abort instead of separate assert()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-03-18 17:57:40 +00:00
Peter Maydell d649689a8e * Bugfixes all over the place
* get/set_uint cleanups (Felipe)
 * Lock guard support (Stefan)
 * MemoryRegion ownership cleanup (Philippe)
 * AVX512 optimization for buffer_is_zero (Robert)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJecOZiAAoJEL/70l94x66DgGkH/jpY4IgqlSAAWCgaxfe1n1vg
 ahSzSLrC8wiJq2Jxbmxn+5BbH6BxQ9ibflsY5bvCY/sTb7UlOFCPkFhQ2iUgplkw
 ciB5UfgCA6OHpKEhpHhXtzlybtNOlxXNWYJ1SrcVXbRES8f7XdhMKs15mnJJuOOE
 k/tuZo/44yZRJl0Cv+nkvIFcCVgyu1q0Lln/1MMPngY2r9gt893cY9feTBSSWgnp
 +7HZr5TXI7mcIytczFKzbdujlG4391DGejKX66IIxGcWg9vXS7TwAStzH1vSKVfJ
 73SKZBoCU5gpHHHC+dqVyouMerV+UE+WQPNtF+LCsNgJBw/2NXc1ZgDrtz1OI2c=
 =+LRX
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* Bugfixes all over the place
* get/set_uint cleanups (Felipe)
* Lock guard support (Stefan)
* MemoryRegion ownership cleanup (Philippe)
* AVX512 optimization for buffer_is_zero (Robert)

# gpg: Signature made Tue 17 Mar 2020 15:01:54 GMT
# gpg:                using RSA key BFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (62 commits)
  hw/arm: Let devices own the MemoryRegion they create
  hw/arm: Remove unnecessary memory_region_set_readonly() on ROM alias
  hw/ppc/ppc405: Use memory_region_init_rom() with read-only regions
  hw/arm/stm32: Use memory_region_init_rom() with read-only regions
  hw/char: Let devices own the MemoryRegion they create
  hw/riscv: Let devices own the MemoryRegion they create
  hw/dma: Let devices own the MemoryRegion they create
  hw/display: Let devices own the MemoryRegion they create
  hw/core: Let devices own the MemoryRegion they create
  scripts/cocci: Patch to let devices own their MemoryRegions
  scripts/cocci: Patch to remove unnecessary memory_region_set_readonly()
  scripts/cocci: Patch to detect potential use of memory_region_init_rom
  hw/sparc: Use memory_region_init_rom() with read-only regions
  hw/sh4: Use memory_region_init_rom() with read-only regions
  hw/riscv: Use memory_region_init_rom() with read-only regions
  hw/ppc: Use memory_region_init_rom() with read-only regions
  hw/pci-host: Use memory_region_init_rom() with read-only regions
  hw/net: Use memory_region_init_rom() with read-only regions
  hw/m68k: Use memory_region_init_rom() with read-only regions
  hw/display: Use memory_region_init_rom() with read-only regions
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-03-17 18:33:05 +00:00
Markus Armbruster 20ac582d0c Use &error_abort instead of separate assert()
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200313170517.22480-2-armbru@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[Unused Error *variable deleted]
2020-03-17 16:05:40 +01:00
Philippe Mathieu-Daudé 880a7817c1 misc: Replace zero-length arrays with flexible array member (manual)
Description copied from Linux kernel commit from Gustavo A. R. Silva
(see [3]):

--v-- description start --v--

  The current codebase makes use of the zero-length array language
  extension to the C90 standard, but the preferred mechanism to
  declare variable-length types such as these ones is a flexible
  array member [1], introduced in C99:

  struct foo {
      int stuff;
      struct boo array[];
  };

  By making use of the mechanism above, we will get a compiler
  warning in case the flexible array does not occur last in the
  structure, which will help us prevent some kind of undefined
  behavior bugs from being unadvertenly introduced [2] to the
  Linux codebase from now on.

--^-- description end --^--

Do the similar housekeeping in the QEMU codebase (which uses
C99 since commit 7be41675f7).

All these instances of code were found with the help of the
following command (then manual analysis, without modifying
structures only having a single flexible array member, such
QEDTable in block/qed.h):

  git grep -F '[0];'

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=76497732932f
[3] https://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git/commit/?id=17642a2fbd2c1

Inspired-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 22:07:42 +01:00
Philippe Mathieu-Daudé f7795e4096 misc: Replace zero-length arrays with flexible array member (automatic)
Description copied from Linux kernel commit from Gustavo A. R. Silva
(see [3]):

--v-- description start --v--

  The current codebase makes use of the zero-length array language
  extension to the C90 standard, but the preferred mechanism to
  declare variable-length types such as these ones is a flexible
  array member [1], introduced in C99:

  struct foo {
      int stuff;
      struct boo array[];
  };

  By making use of the mechanism above, we will get a compiler
  warning in case the flexible array does not occur last in the
  structure, which will help us prevent some kind of undefined
  behavior bugs from being unadvertenly introduced [2] to the
  Linux codebase from now on.

--^-- description end --^--

Do the similar housekeeping in the QEMU codebase (which uses
C99 since commit 7be41675f7).

All these instances of code were found with the help of the
following Coccinelle script:

  @@
  identifier s, m, a;
  type t, T;
  @@
   struct s {
      ...
      t m;
  -   T a[0];
  +   T a[];
  };
  @@
  identifier s, m, a;
  type t, T;
  @@
   struct s {
      ...
      t m;
  -   T a[0];
  +   T a[];
   } QEMU_PACKED;

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=76497732932f
[3] https://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git/commit/?id=17642a2fbd2c1

Inspired-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 22:07:42 +01:00
Vladimir Sementsov-Ogievskiy 4ab78b1918 block/io: fix bdrv_co_do_copy_on_readv
Prior to 1143ec5ebf it was OK to qemu_iovec_from_buf() from aligned-up
buffer to original qiov, as qemu_iovec_from_buf() will stop at qiov end
anyway.

But after 1143ec5ebf we assume that bdrv_co_do_copy_on_readv works on
part of original qiov, defined by qiov_offset and bytes. So we must not
touch qiov behind qiov_offset+bytes bound. Fix it.

Cc: qemu-stable@nongnu.org # v4.2
Fixes: 1143ec5ebf
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200312081949.5350-1-vsementsov@virtuozzo.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-03-16 11:46:11 +00:00
Peter Maydell 49780a582d Block layer patches:
- Relax restrictions for blockdev-snapshot (allows libvirt to do live
   storage migration with blockdev-mirror)
 - luks: Delete created files when block_crypto_co_create_opts_luks fails
 - Fix memleaks in qmp_object_add
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJeaQYTAAoJEH8JsnLIjy/WQwYP/3pzAjqVecL3dGmnPWAkBqCV
 CFpxT2nIMe+xCBvWQBoeekHsFJ7GQf4E1WVNRZgoAh9VQkvkajZsVNn8Auo2Veq2
 c7w/R4xf/Wet2hKGVRS0JXwbg69U5BbpcF7E2DRNfp+CvaDCafHSDNeGTb3hFUjT
 x1jwhK6VqfY7+LHU0B0QmX2KA66nDx1p8l8HJQYd1MlCKAbj8kv/swEbqBJn32hA
 32CIYfC4VCqkW5va1eOjd3Kyi/ugkFCHTI8+mOa45/BFBzIiKfCsDaFHh/DI59QB
 qcDKkUcO3+W788vCKgJQGnG070TwKPx2OnjhxFKiEGaoX3Sz+AY4wUvf3mfFl8GM
 zYqTdOy4Xh0ckvA6JCS0jtAKmANkeEGqnECAgub22Z+kyOzqC05B1FkYwqYDcFXY
 atWKm5Vr47jgD6Oq6O0OpZaZrAUWOfoBmq4ErnrBEHuW5329NEInmjYwxednK+43
 CwU/lSdX7ujRSsjS8Xi1dHS4pxHK/mg51dInL44zGFUayegiLPgA8cuESE0mHOfZ
 67X14rxu6D4Y5r0L+w7rsSGjByR29VynE1McL9fZ1Wp29JHaQ5fjdG6GMXEwYxmV
 R0YNXe85FAlNgqj0Bme+fR2YPxZ48NqHIMOvFFStNHyfD0qQN0TtT1iarfEcBR0u
 jm8MnSoIDLkRXEgBW9bW
 =ociU
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- Relax restrictions for blockdev-snapshot (allows libvirt to do live
  storage migration with blockdev-mirror)
- luks: Delete created files when block_crypto_co_create_opts_luks fails
- Fix memleaks in qmp_object_add

# gpg: Signature made Wed 11 Mar 2020 15:38:59 GMT
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  qemu-iotests: adding LUKS cleanup for non-UTF8 secret error
  crypto.c: cleanup created file when block_crypto_co_create_opts_luks fails
  block.c: adding bdrv_co_delete_file
  block: introducing 'bdrv_co_delete_file' interface
  tests/qemu-iotests: Fix socket_scm_helper build path
  qapi: Add '@allow-write-only-overlay' feature for 'blockdev-snapshot'
  iotests: Add iothread cases to 155
  block: Fix cross-AioContext blockdev-snapshot
  iotests: Test mirror with temporarily disabled target backing file
  iotests: Fix run_job() with use_log=False
  block: Relax restrictions for blockdev-snapshot
  block: Make bdrv_get_cumulative_perm() public
  qom-qmp-cmds: fix two memleaks in qmp_object_add

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-03-12 16:51:26 +00:00
Daniel Henrique Barboza 1bba30da24 crypto.c: cleanup created file when block_crypto_co_create_opts_luks fails
When using a non-UTF8 secret to create a volume using qemu-img, the
following error happens:

$ qemu-img create -f luks --object secret,id=vol_1_encrypt0,file=vol_resize_pool.vol_1.secret.qzVQrI -o key-secret=vol_1_encrypt0 /var/tmp/pool_target/vol_1 10240K

Formatting '/var/tmp/pool_target/vol_1', fmt=luks size=10485760 key-secret=vol_1_encrypt0
qemu-img: /var/tmp/pool_target/vol_1: Data from secret vol_1_encrypt0 is not valid UTF-8

However, the created file '/var/tmp/pool_target/vol_1' is left behind in the
file system after the failure. This behavior can be observed when creating
the volume using Libvirt, via 'virsh vol-create', and then getting "volume
target path already exist" errors when trying to re-create the volume.

The volume file is created inside block_crypto_co_create_opts_luks(), in
block/crypto.c. If the bdrv_create_file() call is successful but any
succeeding step fails*, the existing 'fail' label does not take into
account the created file, leaving it behind.

This patch changes block_crypto_co_create_opts_luks() to delete
'filename' in case of failure. A failure in this point means that
the volume is now truncated/corrupted, so even if 'filename' was an
existing volume before calling qemu-img, it is now unusable. Deleting
the file it is not much worse than leaving it in the filesystem in
this scenario, and we don't have to deal with checking the file
pre-existence in the code.

* in our case, block_crypto_co_create_generic calls qcrypto_block_create,
which calls qcrypto_block_luks_create, and this function fails when
calling qcrypto_secret_lookup_as_utf8.

Reported-by: Srikanth Aithal <bssrikanth@in.ibm.com>
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200130213907.2830642-4-danielhb413@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-11 15:54:38 +01:00
Daniel Henrique Barboza 9bffae14df block: introducing 'bdrv_co_delete_file' interface
Adding to Block Drivers the capability of being able to clean up
its created files can be useful in certain situations. For the
LUKS driver, for instance, a failure in one of its authentication
steps can leave files in the host that weren't there before.

This patch adds the 'bdrv_co_delete_file' interface to block
drivers and add it to the 'file' driver in file-posix.c. The
implementation is given by 'raw_co_delete_file'.

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200130213907.2830642-2-danielhb413@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-11 15:54:38 +01:00
Vladimir Sementsov-Ogievskiy 397f4e9d83 block/block-copy: hide structure definitions
Hide structure definitions and add explicit API instead, to keep an
eye on the scope of the shared fields.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200311103004.7649-10-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy 5332e5d210 block/block-copy: reduce intersecting request lock
Currently, block_copy operation lock the whole requested region. But
there is no reason to lock clusters, which are already copied, it will
disturb other parallel block_copy requests for no reason.

Let's instead do the following:

Lock only sub-region, which we are going to operate on. Then, after
copying all dirty sub-regions, we should wait for intersecting
requests block-copy, if they failed, we should retry these new dirty
clusters.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-Id: <20200311103004.7649-9-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy 8719091f9d block/block-copy: rename start to offset in interfaces
offset/bytes pair is more usual naming in block layer, let's use it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200311103004.7649-8-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy dafaf13593 block/block-copy: refactor interfaces to use bytes instead of end
We have a lot of "chunk_end - start" invocations, let's switch to
bytes/cur_bytes scheme instead.

While being here, improve check on block_copy_do_copy parameters to not
overflow when calculating nbytes and use int64_t for bytes in
block_copy for consistency.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200311103004.7649-7-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy 17187cb646 block/block-copy: factor out find_conflicting_inflight_req
Split find_conflicting_inflight_req to be used separately.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200311103004.7649-6-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy 2d57511a88 block/block-copy: use block_status
Use bdrv_block_status_above to chose effective chunk size and to handle
zeroes effectively.

This substitutes checking for just being allocated or not, and drops
old code path for it. Assistance by backup job is dropped too, as
caching block-status information is more difficult than just caching
is-allocated information in our dirty bitmap, and backup job is not
good place for this caching anyway.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200311103004.7649-5-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy 9d31bc53fa block/block-copy: specialcase first copy_range request
In block_copy_do_copy we fallback to read+write if copy_range failed.
In this case copy_size is larger than defined for buffered IO, and
there is corresponding commit. Still, backup copies data cluster by
cluster, and most of requests are limited to one cluster anyway, so the
only source of this one bad-limited request is copy-before-write
operation.

Further patch will move backup to use block_copy directly, than for
cases where copy_range is not supported, first request will be
oversized in each backup. It's not good, let's change it now.

Fix is simple: just limit first copy_range request like buffer-based
request. If it succeed, set larger copy_range limit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200311103004.7649-4-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy d0ebeca14a block/block-copy: fix progress calculation
Assume we have two regions, A and B, and region B is in-flight now,
region A is not yet touched, but it is unallocated and should be
skipped.

Correspondingly, as progress we have

  total = A + B
  current = 0

If we reset unallocated region A and call progress_reset_callback,
it will calculate 0 bytes dirty in the bitmap and call
job_progress_set_remaining, which will set

   total = current + 0 = 0 + 0 = 0

So, B bytes are actually removed from total accounting. When job
finishes we'll have

   total = 0
   current = B

, which doesn't sound good.

This is because we didn't considered in-flight bytes, actually when
calculating remaining, we should have set (in_flight + dirty_bytes)
as remaining, not only dirty_bytes.

To fix it, let's refactor progress calculation, moving it to block-copy
itself instead of fixing callback. And, of course, track in_flight
bytes count.

We still have to keep one callback, to maintain backup job bytes_read
calculation, but it will go on soon, when we turn the whole backup
process into one block_copy call.

Cc: qemu-stable@nongnu.org
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-Id: <20200311103004.7649-3-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy e7266570f2 block/qcow2-threads: fix qcow2_decompress
On success path we return what inflate() returns instead of 0. And it
most probably works for Z_STREAM_END as it is positive, but is
definitely broken for Z_BUF_ERROR.

While being here, switch to errno return code, to be closer to
qcow2_compress API (and usual expectations).

Revert condition in if to be more positive. Drop dead initialization of
ret.

Cc: qemu-stable@nongnu.org # v4.0
Fixes: 341926ab83
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200302150930.16218-1-vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-11 12:42:30 +01:00
Pan Nengyuan 4aebf0f0da block/qcow2: do free crypto_opts in qcow2_close()
'crypto_opts' forgot to free in qcow2_close(), this patch fix the bellow leak stack:

Direct leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x7f0edd81f970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970)
    #1 0x7f0edc6d149d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d)
    #2 0x55d7eaede63d in qobject_input_start_struct /mnt/sdb/qemu-new/qemu_test/qemu/qapi/qobject-input-visitor.c:295
    #3 0x55d7eaed78b8 in visit_start_struct /mnt/sdb/qemu-new/qemu_test/qemu/qapi/qapi-visit-core.c:49
    #4 0x55d7eaf5140b in visit_type_QCryptoBlockOpenOptions qapi/qapi-visit-crypto.c:290
    #5 0x55d7eae43af3 in block_crypto_open_opts_init /mnt/sdb/qemu-new/qemu_test/qemu/block/crypto.c:163
    #6 0x55d7eacd2924 in qcow2_update_options_prepare /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1148
    #7 0x55d7eacd33f7 in qcow2_update_options /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1232
    #8 0x55d7eacd9680 in qcow2_do_open /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1512
    #9 0x55d7eacdc55e in qcow2_open_entry /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1792
    #10 0x55d7eacdc8fe in qcow2_open /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1819
    #11 0x55d7eac3742d in bdrv_open_driver /mnt/sdb/qemu-new/qemu_test/qemu/block.c:1317
    #12 0x55d7eac3e990 in bdrv_open_common /mnt/sdb/qemu-new/qemu_test/qemu/block.c:1575
    #13 0x55d7eac4442c in bdrv_open_inherit /mnt/sdb/qemu-new/qemu_test/qemu/block.c:3126
    #14 0x55d7eac45c3f in bdrv_open /mnt/sdb/qemu-new/qemu_test/qemu/block.c:3219
    #15 0x55d7ead8e8a4 in blk_new_open /mnt/sdb/qemu-new/qemu_test/qemu/block/block-backend.c:397
    #16 0x55d7eacde74c in qcow2_co_create /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:3534
    #17 0x55d7eacdfa6d in qcow2_co_create_opts /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:3668
    #18 0x55d7eac1c678 in bdrv_create_co_entry /mnt/sdb/qemu-new/qemu_test/qemu/block.c:485
    #19 0x55d7eb0024d2 in coroutine_trampoline /mnt/sdb/qemu-new/qemu_test/qemu/util/coroutine-ucontext.c:115

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200227012950.12256-2-pannengyuan@huawei.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-11 12:42:30 +01:00
David Edmondson 69032253c3 block/curl: HTTP header field names are case insensitive
RFC 7230 section 3.2 indicates that HTTP header field names are case
insensitive.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20200224101310.101169-3-david.edmondson@oracle.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-11 12:42:29 +01:00
David Edmondson 7788a31939 block/curl: HTTP header fields allow whitespace around values
RFC 7230 section 3.2 indicates that whitespace is permitted between
the field name and field value and after the field value.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20200224101310.101169-2-david.edmondson@oracle.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-11 12:42:29 +01:00
Stefan Hajnoczi a9da6e49d8 luks: implement .bdrv_measure()
Add qemu-img measure support in the "luks" block driver.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200221112522.1497712-3-stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-11 12:42:29 +01:00
Stefan Hajnoczi 6d49d3a859 luks: extract qcrypto_block_calculate_payload_offset()
The qcow2 .bdrv_measure() code calculates the crypto payload offset.
This logic really belongs in crypto/block.c where it can be reused by
other image formats.

The "luks" block driver will need this same logic in order to implement
.bdrv_measure(), so extract the qcrypto_block_calculate_payload_offset()
function now.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200221112522.1497712-2-stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-11 12:42:29 +01:00
Maxim Levitsky 89802d5ae7 monitor/hmp: Move hmp_drive_add_node to block-hmp-cmds.c
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200308092440.23564-12-mlevitsk@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-03-09 18:20:22 +00:00
Maxim Levitsky 2bcad73c4b monitor/hmp: move hmp_info_block* to block-hmp-cmds.c
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200308092440.23564-11-mlevitsk@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-03-09 18:20:21 +00:00
Maxim Levitsky 1061f8dd80 monitor/hmp: move remaining hmp_block* functions to block-hmp-cmds.c
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200308092440.23564-10-mlevitsk@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-03-09 18:20:13 +00:00
Maxim Levitsky e263120ecc monitor/hmp: move hmp_nbd_server* to block-hmp-cmds.c
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200308092440.23564-9-mlevitsk@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-03-09 18:17:58 +00:00
Maxim Levitsky fce2b91fdf monitor/hmp: move hmp_snapshot_* to block-hmp-cmds.c
hmp_snapshot_blkdev is from GPLv2 version of the hmp-cmds.c thus
have to change the licence to GPLv2

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200308092440.23564-8-mlevitsk@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-03-09 18:07:50 +00:00
Maxim Levitsky 6b7fbf61fb monitor/hmp: move hmp_block_job* to block-hmp-cmds.c
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200308092440.23564-7-mlevitsk@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-03-09 18:07:48 +00:00
Maxim Levitsky 0932e3f23d monitor/hmp: move hmp_drive_mirror and hmp_drive_backup to block-hmp-cmds.c
Moved code was added after 2012-01-13, thus under GPLv2+

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200308092440.23564-6-mlevitsk@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  Fixed commit message
2020-03-09 18:07:35 +00:00
Maxim Levitsky a1edae276a monitor/hmp: move hmp_drive_del and hmp_commit to block-hmp-cmds.c
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200308092440.23564-5-mlevitsk@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-03-09 18:05:33 +00:00
Maxim Levitsky a2dde2f221 monitor/hmp: rename device-hotplug.c to block/monitor/block-hmp-cmds.c
These days device-hotplug.c only contains the hmp_drive_add
In the next patch, rest of hmp_drive* functions will be moved
there.

Also add block-hmp-cmds.h to contain prototypes of these
functions

License for block-hmp-cmds.h since it contains the code
moved from sysemu.h which lacks license and thus according
to LICENSE is under GPLv2+

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200308092440.23564-4-mlevitsk@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-03-09 18:05:31 +00:00
Chen Qun 76e91cda07 block/file-posix: Remove redundant statement in raw_handle_perm_lock()
Clang static code analyzer show warning:
  block/file-posix.c:891:9: warning: Value stored to 'op' is never read
        op = RAW_PL_ABORT;
        ^    ~~~~~~~~~~~~

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200302130715.29440-5-kuhn.chenqun@huawei.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-03-09 15:59:31 +01:00
Chen Qun 35c9453592 block/stream: Remove redundant statement in stream_run()
Clang static code analyzer show warning:
  block/stream.c:186:9: warning: Value stored to 'ret' is never read
        ret = 0;
        ^     ~
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200302130715.29440-3-kuhn.chenqun@huawei.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-03-09 15:59:31 +01:00
Florian Florensa 19ae9ae014 block/rbd: Add support for ceph namespaces
Starting from ceph Nautilus, RBD has support for namespaces, allowing
for finer grain ACLs on images inside a pool, and tenant isolation.

In the rbd cli tool documentation, the new image-spec and snap-spec are :
 - [pool-name/[namespace-name/]]image-name
 - [pool-name/[namespace-name/]]image-name@snap-name

When using an non namespace's enabled qemu, it complains about not
finding the image called namespace-name/image-name, thus we only need to
parse the image once again to find if there is a '/' in its name, and if
there is, use what is before it as the name of the namespace to later
pass it to rados_ioctx_set_namespace.
rados_ioctx_set_namespace if called with en empty string or a null
pointer as the namespace parameters pretty much does nothing, as it then
defaults to the default namespace.

The namespace is extracted inside qemu_rbd_parse_filename, stored in the
qdict, and used in qemu_rbd_connect to make it work with both qemu-img,
and qemu itself.

Signed-off-by: Florian Florensa <fflorensa@online.net>
Message-Id: <20200110111513.321728-2-fflorensa@online.net>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-06 17:21:28 +01:00
Kevin Wolf 14837c6493 qemu-storage-daemon: Add --blockdev option
This adds a --blockdev option to the storage daemon that works the same
as the -blockdev option of the system emulator.

In order to be able to link with blockdev.o, we also need to change
stream.o from common-obj to block-obj, which is where all other block
jobs already are.

In contrast to the system emulator, qemu-storage-daemon options will be
processed in the order they are given. The user needs to take care to
refer to other objects only after defining them.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200224143008.13362-7-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-06 17:15:38 +01:00
Kevin Wolf 12c929bca2 block: Move system emulator QMP commands to block/qapi-sysemu.c
These commands make only sense for system emulators and their
implementations call functions that don't exist in tools (e.g. to
resolve qdev IDs). Move them out so that blockdev.c can be linked to
qemu-storage-daemon.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200224143008.13362-4-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-06 17:15:38 +01:00
Peter Krempa 65eb7c85a3 block/qcow2: Move bitmap reopen into bdrv_reopen_commit_post
The bitmap code requires writing the 'file' child when the qcow2 driver
is reopened in read-write mode.

If the 'file' child is being reopened due to a permissions change, the
modification is commited yet when qcow2_reopen_commit is called. This
means that any attempt to write the 'file' child will end with EBADFD
as the original fd was already closed.

Moving bitmap reopening to the new callback which is called after
permission modifications are commited fixes this as the file descriptor
will be replaced with the correct one.

The above problem manifests itself when reopening 'qcow2' format layer
which uses a 'file-posix' file child which was opened with the
'auto-read-only' property set.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Message-Id: <db118dbafe1955afbc0a18d3dd220931074ce349.1582893284.git.pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-06 17:15:37 +01:00
Max Reitz 3ede935fdb qcow2: Fix alloc_cluster_abort() for pre-existing clusters
handle_alloc() reuses preallocated zero clusters.  If anything goes
wrong during the data write, we do not change their L2 entry, so we
must not let qcow2_alloc_cluster_abort() free them.

Fixes: 8b24cd1415
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200225143130.111267-2-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-06 17:15:37 +01:00
Lukas Straub 08ddb4eb6d block/replication.c: Ignore requests after failover
After failover the Secondary side of replication shouldn't change state, because
it now functions as our primary disk.

In replication_start, replication_do_checkpoint, replication_stop, ignore
the request if current state is BLOCK_REPLICATION_DONE (sucessful failover) or
BLOCK_REPLICATION_FAILOVER (failover in progres i.e. currently merging active
and hidden images into the base image).

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Acked-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2020-03-03 18:04:47 +08:00
Pan Nengyuan 8198cf5ef0 block/nbd: fix memory leak in nbd_open()
In currently implementation there will be a memory leak when
nbd_client_connect() returns error status. Here is an easy way to
reproduce:

1. run qemu-iotests as follow and check the result with asan:
    ./check -raw 143

Following is the asan output backtrack:
Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7f629688a560 in calloc (/usr/lib64/libasan.so.3+0xc7560)
    #1 0x7f6295e7e015 in g_malloc0  (/usr/lib64/libglib-2.0.so.0+0x50015)
    #2 0x56281dab4642 in qobject_input_start_struct  /mnt/sdb/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:295
    #3 0x56281dab1a04 in visit_start_struct  /mnt/sdb/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:49
    #4 0x56281dad1827 in visit_type_SocketAddress  qapi/qapi-visit-sockets.c:386
    #5 0x56281da8062f in nbd_config   /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1716
    #6 0x56281da8062f in nbd_process_options /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1829
    #7 0x56281da8062f in nbd_open /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873

Direct leak of 15 byte(s) in 1 object(s) allocated from:
    #0 0x7f629688a3a0 in malloc (/usr/lib64/libasan.so.3+0xc73a0)
    #1 0x7f6295e7dfbd in g_malloc (/usr/lib64/libglib-2.0.so.0+0x4ffbd)
    #2 0x7f6295e96ace in g_strdup (/usr/lib64/libglib-2.0.so.0+0x68ace)
    #3 0x56281da804ac in nbd_process_options /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1834
    #4 0x56281da804ac in nbd_open /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873

Indirect leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x7f629688a3a0 in malloc (/usr/lib64/libasan.so.3+0xc73a0)
    #1 0x7f6295e7dfbd in g_malloc (/usr/lib64/libglib-2.0.so.0+0x4ffbd)
    #2 0x7f6295e96ace in g_strdup (/usr/lib64/libglib-2.0.so.0+0x68ace)
    #3 0x56281dab41a3 in qobject_input_type_str_keyval /mnt/sdb/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:536
    #4 0x56281dab2ee9 in visit_type_str /mnt/sdb/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:297
    #5 0x56281dad0fa1 in visit_type_UnixSocketAddress_members qapi/qapi-visit-sockets.c:141
    #6 0x56281dad17b6 in visit_type_SocketAddress_members qapi/qapi-visit-sockets.c:366
    #7 0x56281dad186a in visit_type_SocketAddress qapi/qapi-visit-sockets.c:393
    #8 0x56281da8062f in nbd_config /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1716
    #9 0x56281da8062f in nbd_process_options /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1829
    #10 0x56281da8062f in nbd_open /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873

Fixes: 8f071c9db5
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: qemu-stable <qemu-stable@nongnu.org>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <1575517528-44312-3-git-send-email-pannengyuan@huawei.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-02-26 17:29:00 -06:00
Pan Nengyuan 7f493662be block/nbd: extract the common cleanup code
The BDRVNBDState cleanup code is common in two places, add
nbd_clear_bdrvstate() function to do these cleanups.

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <1575517528-44312-2-git-send-email-pannengyuan@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: fix compilation error and commit message]
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-02-26 17:28:08 -06:00
Eric Blake 2485f22fe9 nbd-client: Support leading / in NBD URI
The NBD URI specification [1] states that only one leading slash at
the beginning of the URI path component is stripped, not all such
slashes.  This becomes important to a patch I just proposed to nbdkit
[2], which would allow the exportname to select a file embedded within
an ext2 image: ext2fs demands an absolute pathname beginning with '/',
and because qemu was inadvertantly stripping it, my nbdkit patch had
to work around the behavior.

[1] https://github.com/NetworkBlockDevice/nbd/blob/master/doc/uri.md
[2] https://www.redhat.com/archives/libguestfs/2020-February/msg00109.html

Note that the qemu bug only affects handling of URIs such as
nbd://host:port//abs/path (where '/abs/path' should be the export
name); it is still possible to use --image-opts and pass the desired
export name with a leading slash directly through JSON even without
this patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200212023101.1162686-1-eblake@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-02-26 14:45:02 -06:00
Max Reitz 804359b8b9 block: Fix VM size field width in snapshot dump
When printing the snapshot list (e.g. with qemu-img snapshot -l), the VM
size field is only seven characters wide.  As of de38b5005e, this is
not necessarily sufficient: We generally print three digits, and this
may require a decimal point.  Also, the unit field grew from something
as plain as "M" to " MiB".  This means that number and unit may take up
eight characters in total; but we also want spaces in front.

Considering previously the maximum width was four characters and the
field width was chosen to be three characters wider, let us adjust the
field width to be eleven now.

Fixes: de38b5005e
Buglink: https://bugs.launchpad.net/qemu/+bug/1859989
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200117105859.241818-2-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-20 16:43:42 +01:00
Max Reitz 80f0900905 iscsi: Drop iscsi_co_create_opts()
The generic fallback implementation effectively does the same.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200122164532.178040-5-mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-20 16:43:42 +01:00
Max Reitz 87ca3b8fa6 file-posix: Drop hdev_co_create_opts()
The generic fallback implementation effectively does the same.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200122164532.178040-4-mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-20 16:43:42 +01:00
Max Reitz 78c81a3f10 block/nbd: Fix hang in .bdrv_close()
When nbd_close() is called from a coroutine, the connection_co never
gets to run, and thus nbd_teardown_connection() hangs.

This is because aio_co_enter() only puts the connection_co into the main
coroutine's wake-up queue, so this main coroutine needs to yield and
wait for connection_co to terminate.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200122164532.178040-2-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-20 16:43:42 +01:00
Vladimir Sementsov-Ogievskiy 4bc267a7c7 block/backup-top: fix flags handling
backup-top "supports" write-unchanged, by skipping CBW operation in
backup_top_co_pwritev. But it forgets to do the same in
backup_top_co_pwrite_zeroes, as well as declare support for
BDRV_REQ_WRITE_UNCHANGED.

Fix this, and, while being here, declare also support for flags
supported by source child.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200207161231.32707-1-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-20 16:43:42 +01:00
Daniel P. Berrangé 087ab8e775 block: always fill entire LUKS header space with zeros
When initializing the LUKS header the size with default encryption
parameters will currently be 2068480 bytes. This is rounded up to
a multiple of the cluster size, 2081792, with 64k sectors. If the
end of the header is not the same as the end of the cluster we fill
the extra space with zeros. This was forgetting that not even the
space allocated for the header will be fully initialized, as we
only write key material for the first key slot. The space left
for the other 7 slots is never written to.

An optimization to the ref count checking code:

  commit a5fff8d4b4 (refs/bisect/bad)
  Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
  Date:   Wed Feb 27 16:14:30 2019 +0300

    qcow2-refcount: avoid eating RAM

made the assumption that every cluster which was allocated would
have at least some data written to it. This was violated by way
the LUKS header is only partially written, with much space simply
reserved for future use.

Depending on the cluster size this problem was masked by the
logic which wrote zeros between the end of the LUKS header and
the end of the cluster.

$ qemu-img create --object secret,id=cluster_encrypt0,data=123456 \
   -f qcow2 -o cluster_size=2k,encrypt.iter-time=1,\
               encrypt.format=luks,encrypt.key-secret=cluster_encrypt0 \
               cluster_size_check.qcow2 100M
  Formatting 'cluster_size_check.qcow2', fmt=qcow2 size=104857600
    encrypt.format=luks encrypt.key-secret=cluster_encrypt0
    encrypt.iter-time=1 cluster_size=2048 lazy_refcounts=off refcount_bits=16

$ qemu-img check --object secret,id=cluster_encrypt0,data=redhat \
    'json:{"driver": "qcow2", "encrypt.format": "luks", \
           "encrypt.key-secret": "cluster_encrypt0", \
           "file.driver": "file", "file.filename": "cluster_size_check.qcow2"}'
ERROR: counting reference for region exceeding the end of the file by one cluster or more: offset 0x2000 size 0x1f9000
Leaked cluster 4 refcount=1 reference=0
...snip...
Leaked cluster 130 refcount=1 reference=0

1 errors were found on the image.
Data may be corrupted, or further writes to the image may corrupt it.

127 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.
Image end offset: 268288

The problem only exists when the disk image is entirely empty. Writing
data to the disk image payload will solve the problem by causing the
end of the file to be extended further.

The change fixes it by ensuring that the entire allocated LUKS header
region is fully initialized with zeros. The qemu-img check will still
fail for any pre-existing disk images created prior to this change,
unless at least 1 byte of the payload is written to.

Fully writing zeros to the entire LUKS header is a good idea regardless
as it ensures that space has been allocated on the host filesystem (or
whatever block storage backend is used).

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20200207135520.2669430-1-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-20 16:43:42 +01:00
Peter Krempa facda5443f qapi: Allow getting flat output from 'query-named-block-nodes'
When a management application manages node names there's no reason to
recurse into backing images in the output of query-named-block-nodes.

Add a parameter to the command which will return just the top level
structs.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Message-Id: <4470f8c779abc404dcf65e375db195cd91a80651.1579509782.git.pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[mreitz: Fixed coding style]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-20 16:43:42 +01:00
Max Reitz 3c7f75b321 quorum: Stop marking it as a filter
Quorum is not a filter, for example because it cannot guarantee which of
its children will serve the next request.  Thus, any of its children may
differ from the data visible to quorum's parents.

We have other filters with multiple children, but they differ in this
aspect:

- blkverify quits the whole qemu process if its children differ.  As
  such, we can always skip it when we want to skip it (as a filter node)
  by going to any of its children.  Both have the same data.

- replication generally serves requests from bs->file, so this is its
  only actually filtered child.

- Block job filters currently only have one child, but they will
  probably get more children in the future.  Still, they will always
  have only one actually filtered child.

Having "filters" as a dedicated node category only makes sense if you
can skip them by going to a one fixed child that always shows the same
data as the filter node.  Quorum cannot fulfill this, so it is not a
filter.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200218103454.296704-13-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18 11:55:40 +01:00
Max Reitz 6e9cc05181 mirror: Double-check immediately before replacing
There is no guarantee that we can still replace the node we want to
replace at the end of the mirror job.  Double-check by calling
bdrv_recurse_can_replace().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200218103454.296704-12-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18 11:55:40 +01:00
Max Reitz 6b4907cf42 block: Remove bdrv_recurse_is_first_non_filter()
It no longer has any users.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200218103454.296704-11-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18 11:55:40 +01:00
Max Reitz a3ed794b36 quorum: Implement .bdrv_recurse_can_replace()
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200218103454.296704-9-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18 11:55:40 +01:00
Max Reitz 998a6b2fc5 blkverify: Implement .bdrv_recurse_can_replace()
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200218103454.296704-8-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18 11:55:39 +01:00
Max Reitz 37a3791b38 quorum: Fix child permissions
Quorum cannot share WRITE or RESIZE on its children.  Presumably, it
only does so because as a filter, it seemed intuitively correct to point
its .bdrv_child_perm to bdrv_filter_default_perm().

However, it is not really a filter, and bdrv_filter_default_perm() does
not work for it, so we have to provide a custom .bdrv_child_perm
implementation.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200218103454.296704-6-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18 11:55:39 +01:00
Philippe Mathieu-Daudé 74e4a8a961 block/io_uring: Remove superfluous semicolon
Fixes: 6663a0a337
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200218094402.26625-5-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18 10:54:02 +01:00
Kevin Wolf 9ad1e79f3f commit: Fix is_read for block_job_error_action()
block_job_error_action() needs to know if reading from the top node or
writing to the base node failed so that it can set the right 'operation'
in the BLOCK_JOB_ERROR QMP event.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200214200812.28180-6-kwolf@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18 10:53:56 +01:00
Kevin Wolf 0c42e175fc commit: Inline commit_populate()
commit_populate() is a very short function and only called in a single
place. Its return value doesn't tell us whether an error happened while
reading or writing, which would be necessary for sending the right data
in the BLOCK_JOB_ERROR QMP event.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200214200812.28180-5-kwolf@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18 10:53:56 +01:00
Kevin Wolf c5507b4d55 commit: Fix argument order for block_job_error_action()
The block_job_error_action() error call in the commit job gives the
on_err and is_read arguments in the wrong order. Fix this.

(Of course, hard-coded is_read = false is wrong, too, but that's a
separate problem for a separate patch.)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200214200812.28180-4-kwolf@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18 10:53:56 +01:00
Kevin Wolf d71e65ec1d commit: Remove unused bytes_written
The bytes_written variable is only ever written to, it serves no
purpose. This has actually been the case since the commit job was first
introduced in commit 747ff60263.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200214200812.28180-3-kwolf@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18 10:53:56 +01:00
Philippe Mathieu-Daudé 5b1405db0f block/qcow2-bitmap: Remove unneeded variable assignment
Fix warning reported by Clang static code analyzer:

    CC      block/qcow2-bitmap.o
  block/qcow2-bitmap.c:650:5: warning: Value stored to 'ret' is never read
      ret = -EINVAL;
      ^     ~~~~~~~

Fixes: 88ddffae8
Reported-by: Clang Static Analyzer
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200215161557.4077-2-philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18 10:53:56 +01:00
Kevin Wolf c3b6658c1a qcow2: Fix qcow2_alloc_cluster_abort() for external data file
For external data file, cluster allocations return an offset in the data
file and are not refcounted. In this case, there is nothing to do for
qcow2_alloc_cluster_abort(). Freeing the same offset in the qcow2 file
is wrong and causes crashes in the better case or image corruption in
the worse case.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200211094900.17315-3-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18 10:53:56 +01:00
Kevin Wolf dea9052ef1 qcow2: update_refcount(): Reset old_table_index after qcow2_cache_put()
In the case that update_refcount() frees a refcount block, it evicts it
from the metadata cache. Before doing so, however, it returns the
currently used refcount block to the cache because it might be the same.
Returning the refcount block early means that we need to reset
old_table_index so that we reload the refcount block in the next
iteration if it is actually still in use.

Fixes: f71c08ea8e
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200211094900.17315-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18 10:53:56 +01:00
Hikaru Nishida 8475ea4854 block/vvfat: Do not unref qcow on closing backing bdrv
Before this commit, BDRVVVFATState.qcow is unrefed in write_target_close
on closing backing bdrv of vvfat. However, qcow bdrv is opend as a child
of vvfat in enable_write_target() so it will be also unrefed on closing
vvfat itself. This causes use-after-free of qcow on freeing vvfat which
has backing bdrv and qcow bdrv as children in this order because
bdrv_close(vvfat) tries to free qcow bdrv after freeing backing bdrv
as QLIST_FOREACH_SAFE() loop keeps next pointer, but BdrvChild of qcow
is already freed in bdrv_close(backing bdrv).

Signed-off-by: Hikaru Nishida <hikarupsp@gmail.com>
Message-Id: <20200209175156.85748-1-hikarupsp@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18 10:53:56 +01:00
Alberto Garcia 2d4b5256cf qcow2: Fix alignment checks in encrypted images
I/O requests to encrypted media should be aligned to the sector size
used by the underlying encryption method, not to BDRV_SECTOR_SIZE.
Fortunately this doesn't break anything at the moment because
both existing QCRYPTO_BLOCK_*_SECTOR_SIZE have the same value as
BDRV_SECTOR_SIZE.

The checks in qcow2_co_preadv_encrypted() are also unnecessary because
they are repeated immediately afterwards in qcow2_co_encdec().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200213171646.15876-1-berto@igalia.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18 10:53:56 +01:00
Kevin Wolf 7e6c4ff792 mirror: Don't let an operation wait for itself
mirror_wait_for_free_in_flight_slot() just picks a random operation to
wait for. However, when mirror_co_read() waits for free slots, its
MirrorOp is already in s->ops_in_flight, so if not enough slots are
immediately available, an operation can end up waiting for itself to
complete, which results in a hang.

Fix this by passing the current MirrorOp and skipping this operation
when picking an operation to wait for.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1794692
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-02-18 10:53:56 +01:00
Kevin Wolf eed325b92c mirror: Store MirrorOp.co for debuggability
If a coroutine is launched, but the coroutine pointer isn't stored
anywhere, debugging any problems inside the coroutine is quite hard.
Let's store the coroutine pointer of a mirror operation in MirrorOp to
have it available in the debugger.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-02-18 10:53:56 +01:00
Vladimir Sementsov-Ogievskiy ac9d00bf7b block: fix crash on zero-length unaligned write and read
Commit 7a3f542fbd "block/io: refactor padding" occasionally dropped
aligning for zero-length request: bdrv_init_padding() blindly return
false if bytes == 0, like there is nothing to align.

This leads the following command to crash:

./qemu-io --image-opts -c 'write 1 0' \
  driver=blkdebug,align=512,image.driver=null-co,image.size=512

>> qemu-io: block/io.c:1955: bdrv_aligned_pwritev: Assertion
    `(offset & (align - 1)) == 0' failed.
>> Aborted (core dumped)

Prior to 7a3f542fbd we does aligning of such zero requests. Instead of
recovering this behavior let's just do nothing on such requests as it
is useless.

Note that driver may have special meaning of zero-length reqeusts, like
qcow2_co_pwritev_compressed_part, so we can't skip any zero-length
operation. But for unaligned ones, we can't pass it to driver anyway.

This commit also fixes crash in iotest 80 running with -nocache:

./check -nocache -qcow2 80

which crashes on same assertion due to trying to read empty extra data
in qcow2_do_read_snapshots().

Cc: qemu-stable@nongnu.org # v4.2
Fixes: 7a3f542fbd
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20200206164245.17781-1-vsementsov@virtuozzo.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-02-07 16:46:59 +00:00
Vladimir Sementsov-Ogievskiy 0df62f45c1 block/backup-top: fix failure path
We can't access top after call bdrv_backup_top_drop, as it is already
freed at this time.

Also, no needs to unref target child by hand, it will be unrefed on
bdrv_close() automatically.

So, just do bdrv_backup_top_drop if append succeed and one bdrv_unref
otherwise.

Note, that in !appended case bdrv_unref(top) moved into drained section
on source. It doesn't really matter, but just for code simplicity.

Fixes: 7df7868b96
Cc: qemu-stable@nongnu.org # v4.2.0
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20200121142802.21467-2-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06 13:47:45 +01:00
Alberto Garcia 3afea40243 qcow2: Use BDRV_SECTOR_SIZE instead of the hardcoded value
This replaces all remaining instances in the qcow2 code.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: b5f74b606c2d9873b12d29acdb7fd498029c4025.1579374329.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06 13:47:45 +01:00
Alberto Garcia 25ae71db55 qcow2: Don't require aligned offsets in qcow2_co_copy_range_from()
qemu-img's convert_co_copy_range() operates at the sector level and
block_copy() operates at the cluster level so this condition is always
true, but it is not necessary to restrict this here, so let's leave it
to the driver implementation return an error if there is any.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: a4264aaee656910c84161a2965f7a501437379ca.1579374329.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06 13:47:45 +01:00
Alberto Garcia da86f8cbad qcow2: Use bs->bl.request_alignment when updating an L1 entry
When updating an L1 entry the qcow2 driver writes a (512-byte) sector
worth of data to avoid a read-modify-write cycle. Instead of always
writing 512 bytes we should follow the alignment requirements of the
storage backend.

(the only exception is when the alignment is larger than the cluster
size because then we could be overwriting data after the L1 table)

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 71f34d4ae4b367b32fb36134acbf4f4f7ee681f4.1579374329.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06 13:47:45 +01:00
Alberto Garcia 344ffea951 qcow2: Tighten cluster_offset alignment assertions
qcow2_alloc_cluster_offset() and qcow2_get_cluster_offset() always
return offsets that are cluster-aligned so don't just check that they
are sector-aligned.

The check in qcow2_co_preadv_task() is also replaced by an assertion
for the same reason.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 558ba339965f858bede4c73ce3f50f0c0493597d.1579374329.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06 13:47:45 +01:00
Alberto Garcia ef97d608c7 qcow2: Don't round the L1 table allocation up to the sector size
The L1 table is read from disk using the byte-based bdrv_pread() and
is never accessed beyond its last element, so there's no need to
allocate more memory than that.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: b2e27214ec7b03a585931bcf383ee1ac3a641a10.1579374329.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06 13:47:45 +01:00
Alberto Garcia 7cdca2e233 qcow2: Use a GString in report_unsupported_feature()
This is a bit more efficient than having to allocate and free memory
for each item.

The default size (60) is enough for all the existing incompatible
features or the "Unknown incompatible feature" message.

Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20200115135626.19442-1-berto@igalia.com
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06 13:47:45 +01:00
Alberto Garcia 3a75a870ef qcow2: Assert that host cluster offsets fit in L2 table entries
The standard cluster descriptor in L2 table entries has a field to
store the host cluster offset. When we need to get that offset from an
entry we use L2E_OFFSET_MASK to ensure that we only use the bits that
belong to that field.

But while that mask is used every time we read from an L2 entry, it
is never used when we write to it. Due to the QCOW_MAX_CLUSTER_OFFSET
limit set in the cluster allocation code QEMU can never produce
offsets that don't fit in that field so any such offset would indicate
a bug in QEMU.

Compressed cluster descriptors contain two fields (host cluster offset
and size of the compressed data) and the situation with them is
similar. In this case the masks are not constant but are stored in the
csize_mask and cluster_offset_mask fields of BDRVQcow2State.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20200113161146.20099-1-berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06 13:47:45 +01:00
Aarushi Mehta daffeb027b block/io_uring: adds userspace completion polling
Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200120141858.587874-11-stefanha@redhat.com
Message-Id: <20200120141858.587874-11-stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-30 20:59:42 +00:00
Aarushi Mehta d803f59050 block: add trace events for io_uring
Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200120141858.587874-10-stefanha@redhat.com
Message-Id: <20200120141858.587874-10-stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-30 20:59:42 +00:00
Aarushi Mehta c644751069 block/file-posix.c: extend to use io_uring
Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
Reviewed-by: Maxim Levitsky <maximlevitsky@gmail.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200120141858.587874-9-stefanha@redhat.com
Message-Id: <20200120141858.587874-9-stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-30 20:59:42 +00:00
Aarushi Mehta 6663a0a337 block/io_uring: implements interfaces for io_uring
Aborts when sqe fails to be set as sqes cannot be returned to the
ring. Adds slow path for short reads for older kernels

Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200120141858.587874-5-stefanha@redhat.com
Message-Id: <20200120141858.587874-5-stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-30 20:59:41 +00:00
Paolo Bonzini 3ba0e1a00c block/io: take bs->reqs_lock in bdrv_mark_request_serialising
bdrv_mark_request_serialising is writing the overlap_offset and
overlap_bytes fields of BdrvTrackedRequest.  Take bs->reqs_lock
for the whole duration of it, and not just when waiting for
serialising requests, so that tracked_request_overlaps does not
look at a half-updated request.

The new code does not unlock/relock around retries.  This is unnecessary
because a retry is always preceded by a CoQueue wait, which already
releases and reacquires bs->reqs_lock.

Reported-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1578495356-46219-4-git-send-email-pbonzini@redhat.com
Message-Id: <1578495356-46219-4-git-send-email-pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-30 20:59:41 +00:00
Paolo Bonzini 18fbd0dec7 block/io: wait for serialising requests when a request becomes serialising
Marking without waiting would not result in actual serialising behavior.
Thus, make a call bdrv_mark_request_serialising sufficient for
serialisation to happen.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1578495356-46219-3-git-send-email-pbonzini@redhat.com
Message-Id: <1578495356-46219-3-git-send-email-pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-30 20:59:41 +00:00
Paolo Bonzini c53cb42769 block: eliminate BDRV_REQ_NO_SERIALISING
It is unused since commit 00e30f0 ("block/backup: use backup-top instead
of write notifiers", 2019-10-01), drop it to simplify the code.

While at it, drop redundant assertions on flags.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1578495356-46219-2-git-send-email-pbonzini@redhat.com
Message-Id: <1578495356-46219-2-git-send-email-pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-30 20:59:41 +00:00
Kevin Wolf 5fbf1d56c2 iscsi: Don't access non-existent scsi_lba_status_descriptor
In iscsi_co_block_status(), we may have received num_descriptors == 0
from the iscsi server. Therefore, we can't unconditionally access
lbas->descriptors[0]. Add the missing check.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Felipe Franciosi <felipe@nutanix.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
2020-01-27 17:19:53 +01:00
Felipe Franciosi 693fd2acdf iscsi: Cap block count from GET LBA STATUS (CVE-2020-1711)
When querying an iSCSI server for the provisioning status of blocks (via
GET LBA STATUS), Qemu only validates that the response descriptor zero's
LBA matches the one requested. Given the SCSI spec allows servers to
respond with the status of blocks beyond the end of the LUN, Qemu may
have its heap corrupted by clearing/setting too many bits at the end of
its allocmap for the LUN.

A malicious guest in control of the iSCSI server could carefully program
Qemu's heap (by selectively setting the bitmap) and then smash it.

This limits the number of bits that iscsi_co_block_status() will try to
update in the allocmap so it can't overflow the bitmap.

Fixes: CVE-2020-1711
Cc: qemu-stable@nongnu.org
Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-01-27 17:19:53 +01:00
Eiichi Tsukata fb574de81b block/backup: fix memory leak in bdrv_backup_top_append()
bdrv_open_driver() allocates bs->opaque according to drv->instance_size.
There is no need to allocate it and overwrite opaque in
bdrv_backup_top_append().

Reproducer:

  $ QTEST_QEMU_BINARY=./x86_64-softmmu/qemu-system-x86_64 valgrind -q --leak-check=full tests/test-replication -p /replication/secondary/start
  ==29792== 24 bytes in 1 blocks are definitely lost in loss record 52 of 226
  ==29792==    at 0x483AB1A: calloc (vg_replace_malloc.c:762)
  ==29792==    by 0x4B07CE0: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6000.7)
  ==29792==    by 0x12BAB9: bdrv_open_driver (block.c:1289)
  ==29792==    by 0x12BEA9: bdrv_new_open_driver (block.c:1359)
  ==29792==    by 0x1D15CB: bdrv_backup_top_append (backup-top.c:190)
  ==29792==    by 0x1CC11A: backup_job_create (backup.c:439)
  ==29792==    by 0x1CD542: replication_start (replication.c:544)
  ==29792==    by 0x1401B9: replication_start_all (replication.c:52)
  ==29792==    by 0x128B50: test_secondary_start (test-replication.c:427)
  ...

Fixes: 7df7868b96 ("block: introduce backup-top filter driver")
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-01-27 17:19:53 +01:00
Sergio Lopez 0abf258171 block/backup-top: Don't acquire context while dropping top
All paths that lead to bdrv_backup_top_drop(), except for the call
from backup_clean(), imply that the BDS AioContext has already been
acquired, so doing it there too can potentially lead to QEMU hanging
on AIO_WAIT_WHILE().

An easy way to trigger this situation is by issuing a two actions
transaction, with a proper and a bogus blockdev-backup, so the second
one will trigger a rollback. This will trigger a hang with an stack
trace like this one:

 #0  0x00007fb680c75016 in __GI_ppoll (fds=0x55e74580f7c0, nfds=1, timeout=<optimized out>,
     timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
 #1  0x000055e743386e09 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>)
     at /usr/include/bits/poll2.h:77
 #2  0x000055e743386e09 in qemu_poll_ns
     (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:336
 #3  0x000055e743388dc4 in aio_poll (ctx=0x55e7458925d0, blocking=blocking@entry=true)
     at util/aio-posix.c:669
 #4  0x000055e743305dea in bdrv_flush (bs=bs@entry=0x55e74593c0d0) at block/io.c:2878
 #5  0x000055e7432be58e in bdrv_close (bs=0x55e74593c0d0) at block.c:4017
 #6  0x000055e7432be58e in bdrv_delete (bs=<optimized out>) at block.c:4262
 #7  0x000055e7432be58e in bdrv_unref (bs=bs@entry=0x55e74593c0d0) at block.c:5644
 #8  0x000055e743316b9b in bdrv_backup_top_drop (bs=bs@entry=0x55e74593c0d0) at block/backup-top.c:273
 #9  0x000055e74331461f in backup_job_create
     (job_id=0x0, bs=bs@entry=0x55e7458d5820, target=target@entry=0x55e74589f640, speed=0, sync_mode=MIRROR_SYNC_MODE_FULL, sync_bitmap=sync_bitmap@entry=0x0, bitmap_mode=BITMAP_SYNC_MODE_ON_SUCCESS, compress=false, filter_node_name=0x0, on_source_error=BLOCKDEV_ON_ERROR_REPORT, on_target_error=BLOCKDEV_ON_ERROR_REPORT, creation_flags=0, cb=0x0, opaque=0x0, txn=0x0, errp=0x7ffddfd1efb0) at block/backup.c:478
 #10 0x000055e74315bc52 in do_backup_common
     (backup=backup@entry=0x55e746c066d0, bs=bs@entry=0x55e7458d5820, target_bs=target_bs@entry=0x55e74589f640, aio_context=aio_context@entry=0x55e7458a91e0, txn=txn@entry=0x0, errp=errp@entry=0x7ffddfd1efb0)
     at blockdev.c:3580
 #11 0x000055e74315c37c in do_blockdev_backup
     (backup=backup@entry=0x55e746c066d0, txn=0x0, errp=errp@entry=0x7ffddfd1efb0)
     at /usr/src/debug/qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64/./qapi/qapi-types-block-core.h:1492
 #12 0x000055e74315c449 in blockdev_backup_prepare (common=0x55e746a8de90, errp=0x7ffddfd1f018)
     at blockdev.c:1885
 #13 0x000055e743160152 in qmp_transaction
     (dev_list=<optimized out>, has_props=<optimized out>, props=0x55e7467fe2c0, errp=errp@entry=0x7ffddfd1f088) at blockdev.c:2340
 #14 0x000055e743287ff5 in qmp_marshal_transaction
     (args=<optimized out>, ret=<optimized out>, errp=0x7ffddfd1f0f8)
     at qapi/qapi-commands-transaction.c:44
 #15 0x000055e74333de6c in do_qmp_dispatch
     (errp=0x7ffddfd1f0f0, allow_oob=<optimized out>, request=<optimized out>, cmds=0x55e743c28d60 <qmp_commands>) at qapi/qmp-dispatch.c:132
 #16 0x000055e74333de6c in qmp_dispatch
     (cmds=0x55e743c28d60 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>)
     at qapi/qmp-dispatch.c:175
 #17 0x000055e74325c061 in monitor_qmp_dispatch (mon=0x55e745908030, req=<optimized out>)
     at monitor/qmp.c:145
 #18 0x000055e74325c6fa in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:234
 #19 0x000055e743385866 in aio_bh_call (bh=0x55e745807ae0) at util/async.c:117
 #20 0x000055e743385866 in aio_bh_poll (ctx=ctx@entry=0x55e7458067a0) at util/async.c:117
 #21 0x000055e743388c54 in aio_dispatch (ctx=0x55e7458067a0) at util/aio-posix.c:459
 #22 0x000055e743385742 in aio_ctx_dispatch
     (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260
 #23 0x00007fb68543e67d in g_main_dispatch (context=0x55e745893a40) at gmain.c:3176
 #24 0x00007fb68543e67d in g_main_context_dispatch (context=context@entry=0x55e745893a40) at gmain.c:3829
 #25 0x000055e743387d08 in glib_pollfds_poll () at util/main-loop.c:219
 #26 0x000055e743387d08 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
 #27 0x000055e743387d08 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
 #28 0x000055e74316a3c1 in main_loop () at vl.c:1828
 #29 0x000055e743016a72 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
     at vl.c:4504

Fix this by not acquiring the AioContext there, and ensuring all paths
leading to it have it already acquired (backup_clean()).

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1782111
Signed-off-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-01-27 17:19:53 +01:00
Wangyong 2558cb8dd4 linux-aio: increasing MAX_EVENTS to a larger hardcoded value
Since commit 6040aedddb "virtio-blk:
make queue size configurable",if the user set the queue size to
more than 128 ,it will not take effect. That's because linux aio's
maximum outstanding requests at a time is always less than or equal
to 128.

This patch simply increase MAX_EVENTS to a larger hardcoded value of
1024 as a shortterm fix.

Signed-off-by: wangyong <wang.yongD@h3c.com>
Message-id: faa5781afd354a96a0be152b288f636f@h3c.com
Message-Id: <faa5781afd354a96a0be152b288f636f@h3c.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-13 16:41:45 +00:00
Max Reitz 503ca1262b backup-top: Begin drain earlier
When dropping backup-top, we need to drain the node before freeing the
BlockCopyState.  Otherwise, requests may still be in flight and then the
assertion in shres_destroy() will fail.

(This becomes visible in intermittent failure of 056.)

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191219182638.104621-1-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-01-06 14:26:23 +01:00
Andrey Shinkevich 0d483dce38 qcow2: Allow writing compressed data of multiple clusters
QEMU currently supports writing compressed data of the size equal to
one cluster. This patch allows writing QCOW2 compressed data that
exceed one cluster. Now, we split buffered data into separate clusters
and write them compressed using the block/aio_task API.

Suggested-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1575288906-551879-3-git-send-email-andrey.shinkevich@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-01-06 13:43:07 +01:00
Andrey Shinkevich f41388e0fb block: introduce compress filter driver
Allow writing all the data compressed through the filter driver.
The written data will be aligned by the cluster size.
Based on the QEMU current implementation, that data can be written to
unallocated clusters only. May be used for a backup job.

Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 1575288906-551879-2-git-send-email-andrey.shinkevich@virtuozzo.com
[mreitz: Replace NULL bdrv_get_format_name() by "(no format)"]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-01-06 13:43:07 +01:00
Vladimir Sementsov-Ogievskiy a1db8733d2 qcow2-bitmaps: fix qcow2_can_store_new_dirty_bitmap
qcow2_can_store_new_dirty_bitmap works wrong, as it considers only
bitmaps already stored in the qcow2 image and ignores persistent
BdrvDirtyBitmap objects.

So, let's instead count persistent BdrvDirtyBitmaps. We load all qcow2
bitmaps on open, so there should not be any bitmap in the image for
which we don't have BdrvDirtyBitmaps version. If it is - it's a kind of
corruption, and no reason to check for corruptions here (open() and
close() are better places for it).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191014115126.15360-2-vsementsov@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-01-06 13:43:06 +01:00
PanNengyuan 88be15a9e1 throttle-groups: fix memory leak in throttle_group_set_limit:
This avoid a memory leak when qom-set is called to set throttle_group
limits, here is an easy way to reproduce:

1. run qemu-iotests as follow and check the result with asan:
       ./check -qcow2 184

Following is the asan output backtrack:
Direct leak of 912 byte(s) in 3 object(s) allocated from:
    #0 0xffff8d7ab3c3 in __interceptor_calloc   (/lib64/libasan.so.4+0xd33c3)
    #1 0xffff8d4c31cb in g_malloc0   (/lib64/libglib-2.0.so.0+0x571cb)
    #2 0x190c857 in qobject_input_start_struct  /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:295
    #3 0x19070df in visit_start_struct   /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:49
    #4 0x1948b87 in visit_type_ThrottleLimits   qapi/qapi-visit-block-core.c:3759
    #5 0x17e4aa3 in throttle_group_set_limits   /mnt/sdc/qemu-master/qemu-4.2.0-rc0/block/throttle-groups.c:900
    #6 0x1650eff in object_property_set     /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/object.c:1272
    #7 0x1658517 in object_property_set_qobject   /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/qom-qobject.c:26
    #8 0x15880bb in qmp_qom_set  /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/qom-qmp-cmds.c:74
    #9 0x157e3e3 in qmp_marshal_qom_set  qapi/qapi-commands-qom.c:154

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: PanNengyuan <pannengyuan@huawei.com>
Message-id: 1574835614-42028-1-git-send-email-pannengyuan@huawei.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-01-06 13:43:06 +01:00
Max Reitz 69c6449ff1 blkdebug: Allow taking/unsharing permissions
Sometimes it is useful to be able to add a node to the block graph that
takes or unshare a certain set of permissions for debugging purposes.
This patch adds this capability to blkdebug.

(Note that you cannot make blkdebug release or share permissions that it
needs to take or cannot share, because this might result in assertion
failures in the block layer.  But if the blkdebug node has no parents,
it will not take any permissions and share everything by default, so you
can then freely choose what permissions to take and share.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191108123455.39445-4-mreitz@redhat.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-01-06 13:43:06 +01:00
Peter Maydell 1010af540b Block layer patches:
- qemu-img: fix info --backing-chain --image-opts
 - Error out on image creation with conflicting size options
 - Fix external snapshot with VM state
 - hmp: Allow using qdev ID for qemu-io command
 - Misc code cleanup
 - Many iotests improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJd+7H/AAoJEH8JsnLIjy/WwIoP/igzSCfiH7yOzWexk7J7DUh7
 SplXRiBQSQAZ8KD2umUk80UO6/W19rHAl3bLLu50D5uzQ5PNz1wSW2S81JZsj0nQ
 ErUEr5yxLGAI/zHIw/LrjUSEygbqcZ21R7foNmIy3lInEXf6gGwPQEnZnZgGkH0x
 v5jwv1HpUpEyetnHunX2cK3JpSNEJCYsV+X1nzDwhjYFuGQq0nlhgUZ8BDt6+fEr
 b/S3TuHmj/FXMH+5wrSr0LiCKaIyAwPdREvC+61aklMNFuA8YwF33tOaJLoWeQCD
 qhxFA8jVd8b3dQ0NZXXbliXST5d6AzjpeqbNzsyKze1duVfcfF1RYsC6McojpQ7f
 9qFIXPC9IHe3sNkUZpTtvONLnmCeHTc9lwVIyVp9B5KXcmcLedvYM04WqhZ+/eeJ
 BfgCvXOyF9sL2GaVPFD+98E2DOYG4dI4rmzqn98kQj+RAXDFvvJ+zxuDUC9omE6T
 pVvxczGPXmnxP2ITRNljJyLVCN1YgE2g9S0GAXNM2i4uOXvhFToEwrJLFytVeVsw
 UwtYkE0F8KJyqjje5N3HiXclBVa++PK3+jjNFA+3B6XZ9wcZZRS2ZXmzsyNOJ7qG
 AeIPtrqBCz+QKsHtmS9CwYpvLrPafobHA3WS2z0dsevcWBHYBQBEM2EgXRFC3N3a
 mymYGxMOsG6+8onwqwjJ
 =laVP
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- qemu-img: fix info --backing-chain --image-opts
- Error out on image creation with conflicting size options
- Fix external snapshot with VM state
- hmp: Allow using qdev ID for qemu-io command
- Misc code cleanup
- Many iotests improvements

# gpg: Signature made Thu 19 Dec 2019 17:23:11 GMT
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (30 commits)
  iotests: Test external snapshot with VM state
  hmp: Allow using qdev ID for qemu-io command
  block: Activate recursively even for already active nodes
  iotests: 211: Remove duplication with VM.blockdev_create()
  iotests: 207: Remove duplication with VM.blockdev_create()
  iotests: 266: Convert to VM.blockdev_create()
  iotests: 237: Convert to VM.blockdev_create()
  iotests: 213: Convert to VM.blockdev_create()
  iotests: 212: Convert to VM.blockdev_create()
  iotests: 210: Convert to VM.blockdev_create()
  iotests: 206: Convert to VM.blockdev_create()
  iotests: 255: Drop blockdev_create()
  iotests: Create VM.blockdev_create()
  qcow2: Move error check of local_err near its assignment
  iotests: Fix IMGOPTSSYNTAX for nbd
  iotests/273: Filter format-specific information
  iotests: Add more "_require_drivers" checks to the shell-based tests
  MAINTAINERS: fix qcow2-bitmap.c under Dirty Bitmaps header
  qcow2: Use offset_into_cluster()
  iotests: Support job-complete in run_job()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-12-20 18:25:32 +00:00
Tuguoyi 66be5c3e78 qcow2: Move error check of local_err near its assignment
The local_err check outside of the if block was necessary
when it was introduced in commit d1258dd0c8 because it needed to be
executed even if qcow2_load_autoloading_dirty_bitmaps() returned false.

After some modifications that all required the error check to remain
where it is, commit 9c98f145df finally moved the
qcow2_load_dirty_bitmaps() call into the if block, so now the error
check should be there, too.

Signed-off-by: Guoyi Tu <tu.guoyi@h3c.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-12-19 14:31:52 +01:00
Alberto Garcia 74e60fb56a qcow2: Use offset_into_cluster()
There's a couple of places left in the qcow2 code that still do the
calculation manually, so let's replace them.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-12-18 11:21:16 +01:00
Kevin Wolf 3b65081638 qcow2: Declare BDRV_REQ_NO_FALLBACK supported
In the common case, qcow2_co_pwrite_zeroes() already only modifies
metadata case, so we're fine with or without BDRV_REQ_NO_FALLBACK set.

The only exception is when using an external data file, where the
request is passed down to the block driver of the external data file. We
are forwarding the BDRV_REQ_NO_FALLBACK flag there, though, so this is
fine, too.

Declare the flag supported therefore.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2019-12-18 11:21:16 +01:00
Vladimir Sementsov-Ogievskiy d936613547 nbd: assert that Error** is not NULL in nbd_iter_channel_error
All callers of nbd_iter_channel_error() pass the address of a
local_err variable, and only call this function if an error has
already occurred, using this function to propagate that error.
This is already implied by its name (local_err instead of the classic
errp), but it is worth additionally stressing this by adding an
assertion to make it part of the function contract.

The local_err parameter is not here to return information about
nbd_iter_channel_error failure. Instead it's assumed to be filled when
passed to the function. This is already stressed by its name
(local_err, instead of classic errp). Stress it additionally by
assertion.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20191205174635.18758-22-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2019-12-18 08:43:19 +01:00
Vladimir Sementsov-Ogievskiy e53a578a8b block/snapshot: rename Error ** parameter to more common errp
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@Redhat.com>
Message-Id: <20191205174635.18758-11-vsementsov@virtuozzo.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2019-12-18 08:43:19 +01:00
Vladimir Sementsov-Ogievskiy f56281abd9 block/qcow2-bitmap: fix crash bug in qcow2_co_remove_persistent_dirty_bitmap
Here is double bug:

First, return error but not set errp. This may lead to:
qmp block-dirty-bitmap-remove may report success when actually failed

block-dirty-bitmap-remove used in a transaction will crash, as
qmp_transaction will think that it returned success and will call
block_dirty_bitmap_remove_commit which will crash, as state->bitmap is
NULL

Second (like in anecdote), this case is not an error at all. As it is
documented in the comment above bdrv_co_remove_persistent_dirty_bitmap
definition, absence of bitmap is not an error, and similar case handled
at start of qcow2_co_remove_persistent_dirty_bitmap, it returns 0 when
there is no bitmaps at all.

But when there are some bitmaps, but not the requested one, it return
error with errp unset.

Fix that.

Trigger:
1. create persistent bitmap A
2. shutdown vm  (bitmap A is synced)
3. start vm
4. create persistent bitmap B
5. remove bitmap B - it fails (and crashes if in transaction)

Potential workaround (rather invasive to ask clients to implement it):
1. create persistent bitmap A
2. shutdown vm
3. start vm
4. create persistent bitmap B
5. remember, that we want to remove bitmap B after vm shutdown
...
  some other operations
...
6. vm shutdown
7. start vm in stopped mode, and remove all bitmaps marked for removing
8. stop vm

Fixes: b56a1e3175
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20191205193049.30666-1-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
[eblake: commit message tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-12-09 09:23:04 -06:00
Markus Armbruster cb09104ea8 block/file-posix: Fix laio_init() error handling crash bug
raw_aio_attach_aio_context() passes uninitialized Error *local_err by
reference to laio_init() via aio_setup_linux_aio().  When laio_init()
fails, it passes it on to error_setg_errno(), tripping error_setv()'s
assertion unless @local_err is null by dumb luck.

Fix by initializing @local_err properly.

Fixes: ed6e216171
Cc: Nishanth Aravamudan <naravamudan@digitalocean.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20191130194240.10517-4-armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2019-12-02 16:14:41 +01:00
Vladimir Sementsov-Ogievskiy a505475b95 block/qcow2-bitmap: fix bitmap migration
Fix bitmap migration with dirty-bitmaps capability enabled and shared
storage. We should ignore IN_USE bitmaps in the image on target, when
migrating bitmaps through migration channel, see new comment below.

Fixes: 74da6b9435
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191125125229.13531-2-vsementsov@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-11-26 14:18:01 +01:00
Eric Blake 93676c88d7 nbd: Don't send oversize strings
Qemu as server currently won't accept export names larger than 256
bytes, nor create dirty bitmap names longer than 1023 bytes, so most
uses of qemu as client or server have no reason to get anywhere near
the NBD spec maximum of a 4k limit per string.

However, we weren't actually enforcing things, ignoring when the
remote side violates the protocol on input, and also having several
code paths where we send oversize strings on output (for example,
qemu-nbd --description could easily send more than 4k).  Tighten
things up as follows:

client:
- Perform bounds check on export name and dirty bitmap request prior
  to handing it to server
- Validate that copied server replies are not too long (ignoring
  NBD_INFO_* replies that are not copied is not too bad)
server:
- Perform bounds check on export name and description prior to
  advertising it to client
- Reject client name or metadata query that is too long
- Adjust things to allow full 4k name limit rather than previous
  256 byte limit

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20191114024635.11363-4-eblake@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-11-18 16:01:34 -06:00
Eric Blake cf7c49cf6a bitmap: Enforce maximum bitmap name length
We document that for qcow2 persistent bitmaps, the name cannot exceed
1023 bytes.  It is inconsistent if transient bitmaps do not have to
abide by the same limit, and it is unlikely that any existing client
even cares about using bitmap names this long.  It's time to codify
that ALL bitmaps managed by qemu (whether persistent in qcow2 or not)
have a documented maximum length.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20191114024635.11363-3-eblake@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-11-18 16:01:34 -06:00
Max Reitz 24552feb6a qcow2: Fix QCOW2_COMPRESSED_SECTOR_MASK
Masks for L2 table entries should have 64 bit.

Fixes: b6c246942b
Buglink: https://bugs.launchpad.net/qemu/+bug/1850000
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191028161841.1198-2-mreitz@redhat.com
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-11-07 14:37:46 +01:00
Tuguoyi 570542ecb1 qcow2-bitmap: Fix uint64_t left-shift overflow
There are two issues in In check_constraints_on_bitmap(),
1) The sanity check on the granularity will cause uint64_t
integer left-shift overflow when cluster_size is 2M and the
granularity is BIGGER than 32K.
2) The way to calculate image size that the maximum bitmap
supported can map to is a bit incorrect.
This patch fix it by add a helper function to calculate the
number of bytes needed by a normal bitmap in image and compare
it to the maximum bitmap bytes supported by qemu.

Fixes: 5f72826e7f
Signed-off-by: Guoyi Tu <tu.guoyi@h3c.com>
Message-id: 4ba40cd1e7ee4a708b40899952e49f22@h3c.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-11-07 14:37:33 +01:00
Max Reitz 292d06b925 block/file-posix: Let post-EOF fallocate serialize
The XFS kernel driver has a bug that may cause data corruption for qcow2
images as of qemu commit c8bb23cbdb.  We can work around it by
treating post-EOF fallocates as serializing up until infinity (INT64_MAX
in practice).

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191101152510.11719-4-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-11-04 09:33:51 +01:00
Max Reitz c28107e9e5 block: Add bdrv_co_get_self_request()
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191101152510.11719-3-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-11-04 09:32:51 +01:00
Max Reitz 304d9d7f03 block: Make wait/mark serialising requests public
Make both bdrv_mark_request_serialising() and
bdrv_wait_serialising_requests() public so they can be used from block
drivers.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191101152510.11719-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-11-04 09:29:15 +01:00
Vladimir Sementsov-Ogievskiy dcfbece684 block/block-copy: fix s->copy_size for compressed cluster
0e2402452f allowed writes larger than cluster, but that's
unsupported for compressed write. Fix it.

Fixes: 0e2402452f
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191029150934.26416-1-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-11-04 09:21:45 +01:00
Peter Maydell aaffb85335 Block patches for softfreeze:
- iotest patches
 - Improve performance of the mirror block job in write-blocking mode
 - Limit memory usage for the backup block job
 - Add discard and write-zeroes support to the NVMe host block driver
 - Fix a bug in the mirror job
 - Prevent the qcow2 driver from creating technically non-compliant qcow2
   v3 images (where there is not enough extra data for snapshot table
   entries)
 - Allow callers of bdrv_truncate() (etc.) to determine whether the file
   must be resized to the exact given size or whether it is OK for block
   devices not to shrink
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAl2224ESHG1yZWl0ekBy
 ZWRoYXQuY29tAAoJEPQH2wBh1c9AeXMH/RXKEX4BZYMRKCe41P18tJC9Bl2x0T20
 YeOsZVvpARlr7o/36BF2kGFF4MnL0OQ+9ELuyROX865rk/VL2rWqnHDE5oQM889a
 dFwMs+0zvNbig3iLNcw0H5OkE2mrdM+a1EUdn/lBe/39Z8dPqPxRGqIYHq38Ugdu
 emwSy1nWen7o0f71HRJfyVtI3KcrzXx71FrA/FY2yL/eHz+zRYGZj2SpAdFPkXP/
 lgaz+m0tWhnSW1QzEOXB0Gh69ULt/DczCinYmv5qUY1noW5TPPtiDNCQTts5O4ba
 oJsR3AJv5/l9m65JTmiyQSqnQfPcstrQ5FqOcSnP637cfqUFyWsvdks=
 =L7v1
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2019-10-28' into staging

Block patches for softfreeze:
- iotest patches
- Improve performance of the mirror block job in write-blocking mode
- Limit memory usage for the backup block job
- Add discard and write-zeroes support to the NVMe host block driver
- Fix a bug in the mirror job
- Prevent the qcow2 driver from creating technically non-compliant qcow2
  v3 images (where there is not enough extra data for snapshot table
  entries)
- Allow callers of bdrv_truncate() (etc.) to determine whether the file
  must be resized to the exact given size or whether it is OK for block
  devices not to shrink

# gpg: Signature made Mon 28 Oct 2019 12:13:53 GMT
# gpg:                using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40
# gpg:                issuer "mreitz@redhat.com"
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full]
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* remotes/maxreitz/tags/pull-block-2019-10-28: (69 commits)
  qemu-iotests: restrict 264 to qcow2 only
  Revert "qemu-img: Check post-truncation size"
  block: Pass truncate exact=true where reasonable
  block: Let format drivers pass @exact
  block: Evaluate @exact in protocol drivers
  block: Add @exact parameter to bdrv_co_truncate()
  block: Do not truncate file node when formatting
  block/cor: Drop cor_co_truncate()
  block: Handle filter truncation like native impl.
  iotests: Test qcow2's snapshot table handling
  iotests: Add peek_file* functions
  qcow2: Fix v3 snapshot table entry compliancy
  qcow2: Repair snapshot table with too many entries
  qcow2: Fix overly long snapshot tables
  qcow2: Keep track of the snapshot table length
  qcow2: Fix broken snapshot table entries
  qcow2: Add qcow2_check_fix_snapshot_table()
  qcow2: Separate qcow2_check_read_snapshot_table()
  qcow2: Write v3-compliant snapshot list on upgrade
  qcow2: Put qcow2_upgrade() into its own function
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-10-28 14:40:01 +00:00
Max Reitz e8d04f9237 block: Pass truncate exact=true where reasonable
This is a change in behavior, so all instances need a good
justification.  The comments added here should explain my reasoning.

qed already had a comment that suggests it always expected
bdrv_truncate()/blk_truncate() to behave as if exact=true were passed
(c743849bee came eight months before 55b949c847), so it was simply
broken until now.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190918095144.955-8-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
[mreitz: Changed comment in qed.c to explain why a new QED file must be
         empty, as requested and suggested by Maxim]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 12:08:45 +01:00
Max Reitz e61a28a9b6 block: Let format drivers pass @exact
When truncating a format node, the @exact parameter is generally handled
simply by virtue of the format storing the new size in the image
metadata.  Such formats do not need to pass on the parameter to their
file nodes.

There are exceptions, though:
- raw and crypto cannot store the image size, and thus must pass on
  @exact.

- When using qcow2 with an external data file, it just makes sense to
  keep its size in sync with the qcow2 virtual disk (because the
  external data file is the virtual disk).  Therefore, we should pass
  @exact when truncating it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190918095144.955-7-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 12:05:30 +01:00
Max Reitz 82325ae5f2 block: Evaluate @exact in protocol drivers
We have two protocol drivers that return success when trying to shrink a
block device even though they cannot shrink it.  This behavior is now
only allowed with exact=false, so they should return an error with
exact=true.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190918095144.955-6-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 12:05:24 +01:00
Max Reitz c80d8b06cf block: Add @exact parameter to bdrv_co_truncate()
We have two drivers (iscsi and file-posix) that (in some cases) return
success from their .bdrv_co_truncate() implementation if the block
device is larger than the requested offset, but cannot be shrunk.  Some
callers do not want that behavior, so this patch adds a new parameter
that they can use to turn off that behavior.

This patch just adds the parameter and lets the block/io.c and
block/block-backend.c functions pass it around.  All other callers
always pass false and none of the implementations evaluate it, so that
this patch does not change existing behavior.  Future patches take care
of that.

Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190918095144.955-5-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 12:00:07 +01:00
Max Reitz 26536c7fc2 block: Do not truncate file node when formatting
There is no reason why the format drivers need to truncate the protocol
node when formatting it.  When using the old .bdrv_co_create_ops()
interface, the file will be created with no size option anyway, which
generally gives it a size of 0.  (Exceptions are block devices, which
cannot be truncated anyway.)

When using blockdev-create, the user must have given the file node some
size anyway, so there is no reason why we should override that.

qed is an exception, it needs the file to start completely empty (as
explained by c743849bee).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190918095144.955-4-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:59:57 +01:00
Max Reitz bb8160eb78 block/cor: Drop cor_co_truncate()
No other filter driver has a .bdrv_co_truncate() implementation, and
there is no need to because the general block layer code can handle it
just as well.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190918095144.955-3-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:59:51 +01:00
Max Reitz 6b7e8f8b1c block: Handle filter truncation like native impl.
Make the filter truncation (passing it through to bs->file) a
first-class citizen and handle it exactly as if it was the filter
driver's native implementation of .bdrv_co_truncate().

I do not see a reason not to, it makes the code a bit shorter, and may
be even more correct because this gets us to finish the write_req that
we prepared before (may be important to e.g. bring dirty bitmaps to the
correct size).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190918095144.955-2-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:59:45 +01:00
Max Reitz e40e6e88f6 qcow2: Fix v3 snapshot table entry compliancy
qcow2 v3 images require every snapshot table entry to have at least 16
bytes of extra data.  If they do not, let qemu-img check -r all fix it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-15-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:54:09 +01:00
Max Reitz d2b1d1ec73 qcow2: Repair snapshot table with too many entries
The user cannot choose which snapshots are removed.  This is fine
because we have chosen the maximum snapshot table size to be so large
(65536 entries) that it cannot be reasonably reached.  If the snapshot
table exceeds this size, the image has probably been corrupted in some
way; in this case, it is most important to just make the image usable
such that the user can copy off at least the active layer.
(Also note that the snapshots will be removed only with "-r all", so a
plain "check" or "check -r leaks" will not delete any data.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-14-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:54:08 +01:00
Max Reitz 099febf3ac qcow2: Fix overly long snapshot tables
We currently refuse to open qcow2 images with overly long snapshot
tables.  This patch makes qemu-img check -r all drop all offending
entries past what we deem acceptable.

The user cannot choose which snapshots are removed.  This is fine
because we have chosen the maximum snapshot table size to be so large
(64 MB) that it cannot be reasonably reached.  If the snapshot table
exceeds this size, the image has probably been corrupted in some way; in
this case, it is most important to just make the image usable such that
the user can copy off at least the active layer.
(Also note that the snapshots will be removed only with "-r all", so a
plain "check" or "check -r leaks" will not delete any data.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-13-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:54:04 +01:00
Max Reitz 624143355c qcow2: Keep track of the snapshot table length
When repairing the snapshot table, we truncate entries that have too
much extra data.  This frees up space that we do not have to count
towards the snapshot table size.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-12-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:54:02 +01:00
Max Reitz f91f1f159b qcow2: Fix broken snapshot table entries
The only case where we currently reject snapshot table entries is when
they have too much extra data.  Fix them with qemu-img check -r all by
counting it as a corruption, reducing their extra_data_size, and then
letting qcow2_check_fix_snapshot_table() do the rest.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-11-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:54:02 +01:00
Max Reitz fe446b5da2 qcow2: Add qcow2_check_fix_snapshot_table()
qcow2_check_read_snapshot_table() can perform consistency checks, but it
cannot fix everything.  Specifically, it cannot allocate new clusters,
because that should wait until the refcount structures are known to be
consistent (i.e., after qcow2_check_refcounts()).  Thus, it cannot call
qcow2_write_snapshots().

Do that in qcow2_check_fix_snapshot_table(), which is called after
qcow2_check_refcounts().

Currently, there is nothing that would set result->corruptions, so this
is a no-op.  A follow-up patch will change that.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-10-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:54:01 +01:00
Max Reitz 8bc584fe03 qcow2: Separate qcow2_check_read_snapshot_table()
Reading the snapshot table can fail.  That is a problem when we want to
repair the image.

Therefore, stop reading the snapshot table in qcow2_do_open() in check
mode.  Instead, add a new function qcow2_check_read_snapshot_table()
that reads the snapshot table at a later point.  In the future, we want
to handle errors here and fix them.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-9-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:54:00 +01:00
Max Reitz 0a85af351d qcow2: Write v3-compliant snapshot list on upgrade
qcow2 v3 requires every snapshot table entry to have two extra data
fields: The 64-bit VM state size, and the virtual disk size.  Both are
optional for v2 images, so they may not be present.

qcow2_upgrade() therefore should update the snapshot table to ensure all
entries have these extra data fields.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1727347
Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-8-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:53:52 +01:00
Max Reitz 722efb0c7c qcow2: Put qcow2_upgrade() into its own function
This does not make sense right now, but it will make sense once we need
to do more than to just update s->qcow_version.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-7-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:53:20 +01:00
Max Reitz e0314b56b2 qcow2: Make qcow2_write_snapshots() public
Updating the snapshot list will be useful when upgrading a v2 image to
v3, so we will need to call this function in qcow2.c.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-6-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:53:17 +01:00
Max Reitz fcf9a6b728 qcow2: Keep unknown extra snapshot data
The qcow2 specification says to ignore unknown extra data fields in
snapshot table entries.  Currently, we discard it whenever we update the
image, which is a bit different from "ignore".

This patch makes the qcow2 driver keep all unknown extra data fields
when updating an image's snapshot table.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-5-mreitz@redhat.com
[mreitz: Adjusted comments as proposed by Eric]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:52:57 +01:00
Max Reitz ecf6c7c0c1 qcow2: Add Error ** to qcow2_read_snapshots()
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-4-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:51:09 +01:00
Max Reitz d8fa8442ad qcow2: Use endof()
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-3-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:51:08 +01:00
Max Reitz f93c3add3a mirror: Do not dereference invalid pointers
mirror_exit_common() may be called twice (if it is called from
mirror_prepare() and fails, it will be called from mirror_abort()
again).

In such a case, many of the pointers in the MirrorBlockJob object will
already be freed.  This can be seen most reliably for s->target, which
is set to NULL (and then dereferenced by blk_bs()).

Cc: qemu-stable@nongnu.org
Fixes: 737efc1eda
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191014153931.20699-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:49:37 +01:00
Maxim Levitsky e87a09d625 block/nvme: add support for discard
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-id: 20190913133627.28450-3-mlevitsk@redhat.com
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:34:35 +01:00
Maxim Levitsky e0dd95e373 block/nvme: add support for write zeros
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-id: 20190913133627.28450-2-mlevitsk@redhat.com
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:34:30 +01:00
Vladimir Sementsov-Ogievskiy 0e2402452f block/block-copy: increase buffered copy request
No reason to limit buffered copy to one cluster. Let's allow up to 1
MiB.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191022111805.3432-7-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy 7f739d0e53 block/block-copy: add memory limit
Currently total allocation for parallel requests to block-copy instance
is unlimited. Let's limit it to 128 MiB.

For now block-copy is used only in backup, so actually we limit total
allocation for backup job.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191022111805.3432-6-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy e332a726da block/block-copy: refactor copying
Merge copying code into one function block_copy_do_copy, which only
calls bdrv_ io functions and don't do any synchronization (like dirty
bitmap set/reset).

Refactor block_copy() function so that it takes full decision about
size of chunk to be copied and does all the synchronization (checking
intersecting requests, set/reset dirty bitmaps).

It will help:
 - introduce parallel processing of block_copy iterations: we need to
   calculate chunk size, start async chunk copying and go to the next
   iteration
 - simplify synchronization improvement (like memory limiting in
   further commit and reducing critical section (now we lock the whole
   requested range, when actually we need to lock only dirty region
   which we handle at the moment))

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191022111805.3432-4-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy b3b7036afb block/block-copy: limit copy_range_size to 16 MiB
Large copy range may imply memory allocation and large io effort, so
using 2G copy range request may be bad idea. Let's limit it to 16 MiB.
It also helps the following patch to refactor copy-with-offload
fallback to copy-with-bounce-buffer.

Note, that total memory usage of backup is still not limited, it will
be fixed in further commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191022111805.3432-3-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy 3816edd2cb block/block-copy: allocate buffer in block_copy_with_bounce_buffer
Move bounce_buffer allocation block_copy_with_bounce_buffer. This
commit simplifies further work on implementing copying by larger chunks
(of different size) and further asynchronous handling of block_copy
iterations (with help of block/aio_task API).

Allocation works fast, a lot faster than disk io, so it's not a problem
that we now allocate/free bounce_buffer more times. And we anyway will
have to allocate several bounce_buffers for parallel execution of loop
iterations in future.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191022111805.3432-2-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy 994b44ab20 Revert "mirror: Only mirror granularity-aligned chunks"
This reverts commit 9adc1cb49a.
    "mirror: Only mirror granularity-aligned chunks"

Since previous commit unaligned chunks are supported by
do_sync_target_write.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191011090711.19940-6-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:22:30 +01:00
Vladimir Sementsov-Ogievskiy dbdf699cad block/mirror: support unaligned write in active mirror
Prior 9adc1cb49a do_sync_target_write had a bug: it reset aligned-up
region in the dirty bitmap, which means that we may not copy some bytes
and assume them copied, which actually leads to producing corrupted
target.

So 9adc1cb49a forced dirty bitmap granularity to be
request_alignment for mirror-top filter, so we are not working with
unaligned requests. However forcing large alignment obviously decreases
performance of unaligned requests.

This commit provides another solution for the problem: if unaligned
padding is already dirty, we can safely ignore it, as
1. It's dirty, it will be copied by mirror_iteration anyway
2. It's dirty, so skipping it now we don't increase dirtiness of the
   bitmap and therefore don't damage "synchronicity" of the
   write-blocking mirror.

If unaligned padding is not dirty, we just write it, no reason to touch
dirty bitmap if we succeed (on failure we'll set the whole region
ofcourse, but we loss "synchronicity" on failure anyway).

Note: we need to disable dirty_bitmap, otherwise we will not be able to
see in do_sync_target_write bitmap state before current operation. We
may of course check dirty bitmap before the operation in
bdrv_mirror_top_do_write and remember it, but we don't need active
dirty bitmap for write-blocking mirror anyway.

New code-path is unused until the following commit reverts
9adc1cb49a.

Suggested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191011090711.19940-5-vsementsov@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:22:30 +01:00
Vladimir Sementsov-Ogievskiy b30168647f block/block-backend: add blk_co_pwritev_part
Add blk write function with qiov_offset parameter. It's needed for the
following commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191011090711.19940-4-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:22:30 +01:00
Vladimir Sementsov-Ogievskiy 5c511ac375 block/mirror: simplify do_sync_target_write
do_sync_target_write is called from bdrv_mirror_top_do_write after
write/discard operation, all inside active_write/active_write_settle
protecting us from mirror iteration. So the whole area is dirty for
sure, no reason to examine dirty bitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191011090711.19940-3-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:22:30 +01:00
Wei Yang 038adc2f58 core: replace getpagesize() with qemu_real_host_page_size
There are three page size in qemu:

  real host page size
  host page size
  target page size

All of them have dedicate variable to represent. For the last two, we
use the same form in the whole qemu project, while for the first one we
use two forms: qemu_real_host_page_size and getpagesize().

qemu_real_host_page_size is defined to be a replacement of
getpagesize(), so let it serve the role.

[Note] Not fully tested for some arch or device.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-26 15:38:06 +02:00
Kevin Wolf 5e97855052 qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation()
qcow2_detect_metadata_preallocation() calls qcow2_get_refcount() which
requires s->lock to be taken to protect its accesses to the refcount
table and refcount blocks. However, nothing in this code path actually
took the lock. This could cause the same cache entry to be used by two
requests at the same time, for different tables at different offsets,
resulting in image corruption.

As it would be preferable to base the detection on consistent data (even
though it's just heuristics), let's take the lock not only around the
qcow2_get_refcount() calls, but around the whole function.

This patch takes the lock in qcow2_co_block_status() earlier and asserts
in qcow2_detect_metadata_preallocation() that we hold the lock.

Fixes: 69f47505ee
Cc: qemu-stable@nongnu.org
Reported-by: Michael Weiser <michael.weiser@gmx.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Michael Weiser <michael.weiser@gmx.de>
Reviewed-by: Michael Weiser <michael.weiser@gmx.de>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2019-10-25 15:18:55 +02:00
Vladimir Sementsov-Ogievskiy 8ccf458af5 block/backup: drop dead code from backup_job_create
After commit 00e30f05de, there is no more "goto error" points
after job creation, so after "error:" @job is always NULL and we don't
need roll-back job creation.

Reported-by: Coverity (CID 1406402)
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-10-25 15:15:01 +02:00
Vladimir Sementsov-Ogievskiy f7651539d8 block/nbd: nbd reconnect
Implement reconnect. To achieve this:

1. add new modes:
   connecting-wait: means, that reconnecting is in progress, and there
     were small number of reconnect attempts, so all requests are
     waiting for the connection.
   connecting-nowait: reconnecting is in progress, there were a lot of
     attempts of reconnect, all requests will return errors.

   two old modes are used too:
   connected: normal state
   quit: exiting after fatal error or on close

Possible transitions are:

   * -> quit
   connecting-* -> connected
   connecting-wait -> connecting-nowait (transition is done after
                      reconnect-delay seconds in connecting-wait mode)
   connected -> connecting-wait

2. Implement reconnect in connection_co. So, in connecting-* mode,
    connection_co, tries to reconnect unlimited times.

3. Retry nbd queries on channel error, if we are in connecting-wait
    state.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20191009084158.15614-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-10-22 09:22:07 -05:00
Vladimir Sementsov-Ogievskiy 4dd09f6223 qcow2-bitmap: move bitmap reopen-rw code to qcow2_reopen_commit
The only reason I can imagine for this strange code at the very-end of
bdrv_reopen_commit is the fact that bs->read_only updated after
calling drv->bdrv_reopen_commit in bdrv_reopen_commit. And in the same
time, prior to previous commit, qcow2_reopen_bitmaps_rw did a wrong
check for being writable, when actually it only need writable file
child not self.

So, as it's fixed, let's move things to correct place.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Acked-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190927122355.7344-10-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:53:28 -04:00
Vladimir Sementsov-Ogievskiy f6333cbf8b block/qcow2-bitmap: fix and improve qcow2_reopen_bitmaps_rw
- Correct check for write access to file child, and in correct place
  (only if we want to write).
- Support reopen rw -> rw (which will be used in following commit),
  for example, !bdrv_dirty_bitmap_readonly() is not a corruption if
  bitmap is marked IN_USE in the image.
- Consider unexpected bitmap as a corruption and check other
  combinations of in-image and in-RAM bitmaps.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190927122355.7344-9-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:53:28 -04:00
Vladimir Sementsov-Ogievskiy 644ddbb754 block/qcow2-bitmap: do not remove bitmaps on reopen-ro
qcow2_reopen_bitmaps_ro wants to store bitmaps and then mark them all
readonly. But the latter don't work, as
qcow2_store_persistent_dirty_bitmaps removes bitmaps after storing.
It's OK for inactivation but bad idea for reopen-ro. And this leads to
the following bug:

Assume we have persistent bitmap 'bitmap0'.
Create external snapshot
  bitmap0 is stored and therefore removed
Commit snapshot
  now we have no bitmaps
Do some writes from guest (*)
  they are not marked in bitmap
Shutdown
Start
  bitmap0 is loaded as valid, but it is actually broken! It misses
  writes (*)
Incremental backup
  it will be inconsistent

So, let's stop removing bitmaps on reopen-ro. But don't rejoice:
reopening bitmaps to rw is broken too, so the whole scenario will not
work after this patch and we can't enable corresponding test cases in
260 iotests still. Reopening bitmaps rw will be fixed in the following
patches.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190927122355.7344-7-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy bd429a884c block/qcow2-bitmap: drop qcow2_reopen_bitmaps_rw_hint()
The function is unused, drop it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190927122355.7344-6-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy f88676c149 block/qcow2-bitmap: get rid of bdrv_has_changed_persistent_bitmaps
Firstly, no reason to optimize failure path. Then, function name is
ambiguous: it checks for readonly and similar things, but someone may
think that it will ignore normal bitmaps which was just unchanged, and
this is in bad relation with the fact that we should drop IN_USE flag
for unchanged bitmaps in the image.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190927122355.7344-5-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy ef9041a7b8 block/dirty-bitmap: refactor bdrv_dirty_bitmap_next
bdrv_dirty_bitmap_next is always used in same pattern. So, split it
into _next and _first, instead of combining two functions into one and
add FOR_EACH_DIRTY_BITMAP macro.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190916141911.5255-5-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy 1e63830160 block/dirty-bitmap: drop BdrvDirtyBitmap.mutex
mutex field is just a pointer to bs->dirty_bitmap_mutex, so no needs
to store it in BdrvDirtyBitmap when we have bs pointer in it (since
previous patch).

Drop mutex field. Constantly use bdrv_dirty_bitmaps_lock/unlock in
block/dirty-bitmap.c to make it more obvious that it's not per-bitmap
lock. Still, for simplicity, leave bdrv_dirty_bitmap_lock/unlock
functions as an external API.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190916141911.5255-4-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy 5deb6cbd1f block/dirty-bitmap: add bs link
Add bs field to BdrvDirtyBitmap structure. Drop BlockDriverState
parameter from bitmap APIs where possible.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190916141911.5255-3-vsementsov@virtuozzo.com
[Rebased on top of block-copy. --js]
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy 767db3aad8 block/dirty-bitmap: drop meta
Drop meta bitmaps, as they are unused.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190916141911.5255-2-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy d2c3080e41 block/qcow2: proper locking on bitmap add/remove paths
qmp_block_dirty_bitmap_add and do_block_dirty_bitmap_remove do acquire
aio context since 0a6c86d024. But this is not enough: we also must
lock qcow2 mutex when access in-image metadata. Especially it concerns
freeing qcow2 clusters.

To achieve this, move qcow2_can_store_new_dirty_bitmap and
qcow2_remove_persistent_dirty_bitmap to coroutine context.

Since we work in coroutines in correct aio context, we don't need
context acquiring in blockdev.c anymore, drop it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190920082543.23444-4-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy b56a1e3175 block/dirty-bitmap: return int from bdrv_remove_persistent_dirty_bitmap
It's more comfortable to not deal with local_err.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190920082543.23444-3-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy 85cc8a4f6b block: move bdrv_can_store_new_dirty_bitmap to block/dirty-bitmap.c
block/dirty-bitmap.c seems to be more appropriate for it and
bdrv_remove_persistent_dirty_bitmap already in it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190920082543.23444-2-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:02:32 -04:00
Max Reitz d1b9d19f99 qcow2: Limit total allocation range to INT_MAX
When the COW areas are included, the size of an allocation can exceed
INT_MAX.  This is kind of limited by handle_alloc() in that it already
caps avail_bytes at INT_MAX, but the number of clusters still reflects
the original length.

This can have all sorts of effects, ranging from the storage layer write
call failing to image corruption.  (If there were no image corruption,
then I suppose there would be data loss because the .cow_end area is
forced to be empty, even though there might be something we need to
COW.)

Fix all of it by limiting nb_clusters so the equivalent number of bytes
will not exceed INT_MAX.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-10-14 17:12:48 +02:00
Alberto Garcia f2208fdc5b block: Reject misaligned write requests with BDRV_REQ_NO_FALLBACK
The BDRV_REQ_NO_FALLBACK flag means that an operation should only be
performed if it can be offloaded or otherwise performed efficiently.

However a misaligned write request requires a RMW so we should return
an error and let the caller decide how to proceed.

This hits an assertion since commit c8bb23cbdb if the required
alignment is larger than the cluster size:

qemu-img create -f qcow2 -o cluster_size=2k img.qcow2 4G
qemu-io -c "open -o driver=qcow2,file.align=4k blkdebug::img.qcow2" \
        -c 'write 0 512'
qemu-io: block/io.c:1127: bdrv_driver_pwritev: Assertion `!(flags & BDRV_REQ_NO_FALLBACK)' failed.
Aborted

The reason is that when writing to an unallocated cluster we try to
skip the copy-on-write part and zeroize it using BDRV_REQ_NO_FALLBACK
instead, resulting in a write request that is too small (2KB cluster
size vs 4KB required alignment).

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-10-14 17:12:48 +02:00
Pavel Dovgalyuk e4ec5ad464 replay: add BH oneshot event for block layer
Replay is capable of recording normal BH events, but sometimes
there are single use callbacks scheduled with aio_bh_schedule_oneshot
function. This patch enables recording and replaying such callbacks.
Block layer uses these events for calling the completion function.
Replaying these calls makes the execution deterministic.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-10-14 17:12:48 +02:00
Pavel Dovgalyuk c8aa7895eb replay: don't drain/flush bdrv queue while RR is working
In record/replay mode bdrv queue is controlled by replay mechanism.
It does not allow saving or loading the snapshots
when bdrv queue is not empty. Stopping the VM is not blocked by nonempty
queue, but flushing the queue is still impossible there,
because it may cause deadlocks in replay mode.
This patch disables bdrv_drain_all and bdrv_flush_all in
record/replay mode.

Stopping the machine when the IO requests are not finished is needed
for the debugging. E.g., breakpoint may be set at the specified step,
and forcing the IO requests to finish may break the determinism
of the execution.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-10-14 17:12:48 +02:00
Pavel Dovgalyuk 3c6c4348f2 block: implement bdrv_snapshot_goto for blkreplay
This patch enables making snapshots with blkreplay used in
block devices.
This function is required to make bdrv_snapshot_goto without
calling .bdrv_open which is not implemented.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-10-14 17:12:48 +02:00
Peter Lieven 6caaad46de block/vhdx: add check for truncated image files
qemu is currently not able to detect truncated vhdx image files.
Add a basic check if all allocated blocks are reachable at open and
report all errors during bdrv_co_check.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-10-14 17:12:48 +02:00
Maxim Levitsky e99754b42e nbd: add empty .bdrv_reopen_prepare
Fixes commit job / qemu-img commit, when
commiting qcow2 file which is based on nbd export.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1718727
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-id: 20190930213820.29777-2-mlevitsk@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy 00e30f05de block/backup: use backup-top instead of write notifiers
Drop write notifiers and use filter node instead.

= Changes =

1. Add filter-node-name argument for backup qmp api. We have to do it
in this commit, as 257 needs to be fixed.

2. There are no more write notifiers here, so is_write_notifier
parameter is dropped from block-copy paths.

3. To sync with in-flight requests at job finish we now have drained
removing of the filter, we don't need rw-lock.

4. Block-copy is now using BdrvChildren instead of BlockBackends

5. As backup-top owns these children, we also move block-copy state
into backup-top's ownership.

= Iotest changes =

56: op-blocker doesn't shoot now, as we set it on source, but then
check on filter, when trying to start second backup.
To keep the test we instead can catch another collision: both jobs will
get 'drive0' job-id, as job-id parameter is unspecified. To prevent
interleaving with file-posix locks (as they are dependent on config)
let's use another target for second backup.

Also, it's obvious now that we'd like to drop this op-blocker at all
and add a test-case for two backups from one node (to different
destinations) actually works. But not in these series.

141: Output changed: prepatch, "Node is in use" comes from bdrv_has_blk
check inside qmp_blockdev_del. But we've dropped block-copy blk
objects, so no more blk objects on source bs (job blk is on backup-top
filter bs). New message is from op-blocker, which is the next check in
qmp_blockdev_add.

257: The test wants to emulate guest write during backup. They should
go to filter node, not to original source node, of course. Therefore we
need to specify filter node name and use it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191001131409.14202-6-vsementsov@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy 7df7868b96 block: introduce backup-top filter driver
Backup-top filter caches write operations and does copy-before-write
operations.

The driver will be used in backup instead of write-notifiers.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191001131409.14202-5-vsementsov@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy 0f4b02b73e block/block-copy: split block_copy_set_callbacks function
Split block_copy_set_callbacks out of block_copy_state_new. It's needed
for further commit: block-copy will use BdrvChildren of backup-top
filter, so it will be created from backup-top filter creation function.
But callbacks will still belong to backup job and will be set in
separate.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191001131409.14202-4-vsementsov@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy 843670f30f block/backup: move write_flags calculation inside backup_job_create
This is logic-less refactoring, which simplifies further patch, as
we'll need write_flags for backup-top filter creation and backup-top
should be created before block job creation.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191001131409.14202-3-vsementsov@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy a6ffe1998c block/backup: move in-flight requests handling from backup to block-copy
Move synchronization mechanism to block-copy, to be able to use one
block-copy instance from backup job and backup-top filter in parallel.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191001131409.14202-2-vsementsov@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:18 +02:00
Anton Nefedov d924559953 qapi: query-blockstat: add driver specific file-posix stats
A block driver can provide a callback to report driver-specific
statistics.

file-posix driver now reports discard statistics

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20190923121737.83281-10-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:18 +02:00
Anton Nefedov 1c45036636 file-posix: account discard operations
This will help to identify how many of the user-issued discard operations
(accounted on a device level) have actually suceeded down on the host file
(even though the numbers will not be exactly the same if non-raw format
driver is used (e.g. qcow2 sending metadata discards)).

Note that these numbers will not include discards triggered by
write-zeroes + MAY_UNMAP calls.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190923121737.83281-9-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:18 +02:00
Anton Nefedov f344446654 block: add empty account cookie type
Each block_acct_done/failed call is designed to correspond to a
previous block_acct_start call, which initializes the stats cookie.
However sometimes it is not the case, e.g. some error paths might
report the same cookie twice because it is hard to accurately track if
the cookie was reported yet or not.

This patch cleans the cookie after report.
(Note: block_acct_failed/done without a previous block_acct_start at
all should be avoided. Uninitialized cookie might hold a garbage value
and there is still "< BLOCK_MAX_IOTYPE" assertion for that)

It will be particularly useful in ide code where it's hard to
keep track whether the request done its accounting or not: in the
following patch of the series, trim requests will do the accounting
separately.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190923121737.83281-4-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:18 +02:00
Anton Nefedov 159f85ddc8 qapi: add unmap to BlockDeviceStats
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20190923121737.83281-3-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy beb5f5450d block: move block_copy from block/backup.c to separate file
Split block_copy to separate file, to be cleanly shared with backup-top
filter driver in further commits.

It's a clean movement, the only change is drop "static" from interface
functions.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-8-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy 0e23e382b7 block/backup: fix block-comment style
We need to fix comment style around block-copy functions before further
moving them to separate file to satisfy checkpatch. But do more: fix
all comments style. Also, seems like doubled first asterisk is not
forbidden, but drop it too for consistency.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-7-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy 2c8074c453 block/backup: introduce BlockCopyState
Split copying code part from backup to "block-copy", including separate
state structure and function renaming. This is needed to share it with
backup-top filter driver in further commits.

Notes:

1. As BlockCopyState keeps own BlockBackend objects, remaining
job->common.blk users only use it to get bs by blk_bs() call, so clear
job->commen.blk permissions set in block_job_create and add
job->source_bs to be used instead of blk_bs(job->common.blk), to keep
it more clear which bs we use when introduce backup-top filter in
further commit.

2. Rename s/initializing_bitmap/skip_unallocated/ to sound a bit better
as interface to BlockCopyState

3. Split is not very clean: there left some duplicated fields, backup
code uses some BlockCopyState fields directly, let's postpone it for
further improvements and keep this comment simpler for review.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190920142056.12778-6-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy 372c67ea61 block/backup: improve comment about image fleecing
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-5-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy 0bd0c44372 block/backup: split shareable copying part from backup_do_cow
Split copying logic which will be shared with backup-top filter.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190920142056.12778-4-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy 1048ddf0a3 block/backup: fix backup_cow_with_offload for last cluster
We shouldn't try to copy bytes beyond EOF. Fix it.

Fixes: 9ded4a0114
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190920142056.12778-3-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy 981fb5810a block/backup: fix max_transfer handling for copy_range
Of course, QEMU_ALIGN_UP is a typo, it should be QEMU_ALIGN_DOWN, as we
are trying to find aligned size which satisfy both source and target.
Also, don't ignore too small max_transfer. In this case seems safer to
disable copy_range.

Fixes: 9ded4a0114
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190920142056.12778-2-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy d710cf575a block/qcow2: introduce parallel subrequest handling in read and write
It improves performance for fragmented qcow2 images.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190916175324.18478-6-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy 6aa7a2631b block/qcow2: refactor qcow2_co_pwritev_part
Similarly to previous commit, prepare for parallelizing write-loop
iterations.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190916175324.18478-5-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy 88f468e546 block/qcow2: refactor qcow2_co_preadv_part
Further patch will run partial requests of iterations of
qcow2_co_preadv in parallel for performance reasons. To prepare for
this, separate part which may be parallelized into separate function
(qcow2_co_preadv_task).

While being here, also separate encrypted clusters reading to own
function, like it is done for compressed reading.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190916175324.18478-4-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy 6e9b225f73 block: introduce aio task pool
Common interface for aio task loops. To be used for improving
performance of synchronous io loops in qcow2, block-stream,
copy-on-read, and may be other places.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190916175324.18478-3-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:17 +02:00
Max Reitz 8644476e51 block: Skip COR for inactive nodes
We must not write data to inactive nodes, and a COR is certainly
something we can simply not do without upsetting anyone.  So skip COR
operations on inactive nodes.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191001174827.11081-2-mreitz@redhat.com
Message-Id: <20191001174827.11081-2-mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-10-08 14:28:25 +01:00
Kevin Wolf 05f4aced65 block/snapshot: Restrict set of snapshot nodes
Nodes involved in internal snapshots were those that were returned by
bdrv_next(), inserted and not read-only. bdrv_next() in turn returns all
nodes that are either the root node of a BlockBackend or monitor-owned
nodes.

With the typical -drive use, this worked well enough. However, in the
typical -blockdev case, the user defines one node per option, making all
nodes monitor-owned nodes. This includes protocol nodes etc. which often
are not snapshottable, so "savevm" only returns an error.

Change the conditions so that internal snapshot still include all nodes
that have a BlockBackend attached (we definitely want to snapshot
anything attached to a guest device and probably also the built-in NBD
server; snapshotting block job BlockBackends is more of an accident, but
a preexisting one), but other monitor-owned nodes are only included if
they have no parents.

This makes internal snapshots usable again with typical -blockdev
configurations.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
2019-10-04 11:52:40 +02:00
Philippe Mathieu-Daudé 31e404151b cutils: Move size_to_str() from "qemu-common.h" to "qemu/cutils.h"
"qemu/cutils.h" contains various qemu_strtosz_*() functions
useful to convert strings to size. It seems natural to have
the opposite usage (from size to string) there too.

The function definition is already in util/cutils.c.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190903120555.7551-1-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-09-19 11:57:34 +02:00
Maxim Levitsky 603fbd076c block/qcow2: refactor encryption code
* Change the qcow2_co_{encrypt|decrypt} to just receive full host and
  guest offsets and use this function directly instead of calling
  do_perform_cow_encrypt (which is removed by that patch).

* Adjust qcow2_co_encdec to take full host and guest offsets as well.

* Document the qcow2_co_{encrypt|decrypt} arguments
  to prevent the bug fixed in former commit from hopefully
  happening again.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-id: 20190915203655.21638-3-mlevitsk@redhat.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[mreitz: Let perform_cow() return the error value returned by
         qcow2_co_encrypt(), as proposed by Vladimir]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-16 15:36:22 +02:00
Maxim Levitsky 38e7d54bdc block/qcow2: Fix corruption introduced by commit 8ac0f15f33
This fixes subtle corruption introduced by luks threaded encryption
in commit 8ac0f15f33

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1745922

The corruption happens when we do a write that
   * writes to two or more unallocated clusters at once
   * doesn't fully cover the first sector
   * doesn't fully cover the last sector
   * uses luks encryption

In this case, when allocating the new clusters we COW both areas
prior to the write and after the write, and we encrypt them.

The above mentioned commit accidentally made it so we encrypt the
second COW area using the physical cluster offset of the first area.

The problem is that offset_in_cluster in do_perform_cow_encrypt
can be larger that the cluster size, thus cluster_offset
will no longer point to the start of the cluster at which encrypted
area starts.

Next patch in this series will refactor the code to avoid all these
assumptions.

In the bugreport that was triggered by rebasing a luks image to new,
zero filled base, which lot of such writes, and causes some files
with zero areas to contain garbage there instead.
But as described above it can happen elsewhere as well

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190915203655.21638-2-mlevitsk@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-16 15:35:02 +02:00
Max Reitz c34dc07f9f curl: Check curl_multi_add_handle()'s return code
If we had done that all along, debugging would have been much simpler.
(Also, I/O errors are better than hangs.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190910124136.10565-8-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-16 15:31:12 +02:00
Max Reitz bfb23b480a curl: Handle success in multi_check_completion
Background: As of cURL 7.59.0, it verifies that several functions are
not called from within a callback.  Among these functions is
curl_multi_add_handle().

curl_read_cb() is a callback from cURL and not a coroutine.  Waking up
acb->co will lead to entering it then and there, which means the current
request will settle and the caller (if it runs in the same coroutine)
may then issue the next request.  In such a case, we will enter
curl_setup_preadv() effectively from within curl_read_cb().

Calling curl_multi_add_handle() will then fail and the new request will
not be processed.

Fix this by not letting curl_read_cb() wake up acb->co.  Instead, leave
the whole business of settling the AIOCB objects to
curl_multi_check_completion() (which is called from our timer callback
and our FD handler, so not from any cURL callbacks).

Reported-by: Natalie Gavrielov <ngavrilo@redhat.com>
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1740193
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190910124136.10565-7-mreitz@redhat.com
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-16 15:31:11 +02:00
Max Reitz 9abaf9fc47 curl: Report only ready sockets
Instead of reporting all sockets to cURL, only report the one that has
caused curl_multi_do_locked() to be called.  This lets us get rid of the
QLIST_FOREACH_SAFE() list, which was actually wrong: SAFE foreaches are
only safe when the current element is removed in each iteration.  If it
possible for the list to be concurrently modified, we cannot guarantee
that only the current element will be removed.  Therefore, we must not
use QLIST_FOREACH_SAFE() here.

Fixes: ff5ca1664a
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190910124136.10565-6-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-16 15:31:11 +02:00
Max Reitz 9dbad87d25 curl: Pass CURLSocket to curl_multi_do()
curl_multi_do_locked() currently marks all sockets as ready.  That is
not only inefficient, but in fact unsafe (the loop is).  A follow-up
patch will change that, but to do so, curl_multi_do_locked() needs to
know exactly which socket is ready; and that is accomplished by this
patch here.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190910124136.10565-5-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-16 15:31:11 +02:00
Max Reitz 948403bcb1 curl: Check completion in curl_multi_do()
While it is more likely that transfers complete after some file
descriptor has data ready to read, we probably should not rely on it.
Better be safe than sorry and call curl_multi_check_completion() in
curl_multi_do(), too, just like it is done in curl_multi_read().

With this change, curl_multi_do() and curl_multi_read() are actually the
same, so drop curl_multi_read() and use curl_multi_do() as the sole FD
handler.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190910124136.10565-4-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-16 15:31:11 +02:00
Max Reitz 007f339b10 curl: Keep *socket until the end of curl_sock_cb()
This does not really change anything, but it makes the code a bit easier
to follow once we use @socket as the opaque pointer for
aio_set_fd_handler().

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190910124136.10565-3-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-16 15:31:11 +02:00
Max Reitz 0487861685 curl: Keep pointer to the CURLState in CURLSocket
A follow-up patch will make curl_multi_do() and curl_multi_read() take a
CURLSocket instead of the CURLState.  They still need the latter,
though, so add a pointer to it to the former.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190910124136.10565-2-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-16 15:31:11 +02:00
Nir Soffer 1bbbf32d5f block: Use QEMU_IS_ALIGNED
Replace instances of:

    (n & (BDRV_SECTOR_SIZE - 1)) == 0

And:

   (n & ~BDRV_SECTOR_MASK) == 0

With:

    QEMU_IS_ALIGNED(n, BDRV_SECTOR_SIZE)

Which reveals the intent of the code better, and makes it easier to
locate the code checking alignment.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Message-id: 20190827185913.27427-2-nsoffer@redhat.com
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-16 14:48:30 +02:00
Alberto Garcia bf3d78ae55 qcow2: Stop overwriting compressed clusters one by one
handle_alloc() tries to find as many contiguous clusters that need
copy-on-write as possible in order to allocate all of them at the same
time.

However, compressed clusters are only overwritten one by one, so let's
say that we have an image with 1024 consecutive compressed clusters:

   qemu-img create -f qcow2 hd.qcow2 64M
   for f in `seq 0 64 65472`; do
      qemu-io -c "write -c ${f}k 64k" hd.qcow2
   done

In this case trying to overwrite the whole image with one large write
request results in 1024 separate allocations:

   qemu-io -c "write 0 64M" hd.qcow2

This restriction comes from commit 095a9c58ce from 2008.
Nowadays QEMU can overwrite multiple compressed clusters just fine,
and in fact it already does: as long as the first cluster that
handle_alloc() finds is not compressed, all other compressed clusters
in the same batch will be overwritten in one go:

   qemu-img create -f qcow2 hd.qcow2 64M
   qemu-io -c "write -z 0 64k" hd.qcow2
   for f in `seq 64 64 65472`; do
      qemu-io -c "write -c ${f}k 64k" hd.qcow2
   done

Compared to the previous one, overwriting this image on my computer
goes from 8.35s down to 230ms.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: John Snow <jsnow@redhat.com
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-09-13 12:18:37 +02:00
Philippe Mathieu-Daudé d90d5cae2b block/create: Do not abort if a block driver is not available
The 'blockdev-create' QMP command was introduced as experimental
feature in commit b0292b851b, using the assert() debug call.
It got promoted to 'stable' command in 3fb588a0f2, but the
assert call was not removed.

Some block drivers are optional, and bdrv_find_format() might
return a NULL value, triggering the assertion.

Stable code is not expected to abort, so return an error instead.

This is easily reproducible when libnfs is not installed:

  ./configure
  [...]
  module support    no
  Block whitelist (rw)
  Block whitelist (ro)
  libiscsi support  yes
  libnfs support    no
  [...]

Start QEMU:

  $ qemu-system-x86_64 -S -qmp unix:/tmp/qemu.qmp,server,nowait

Send the 'blockdev-create' with the 'nfs' driver:

  $ ( cat << 'EOF'
  {'execute': 'qmp_capabilities'}
  {'execute': 'blockdev-create', 'arguments': {'job-id': 'x', 'options': {'size': 0, 'driver': 'nfs', 'location': {'path': '/', 'server': {'host': '::1', 'type': 'inet'}}}}, 'id': 'x'}
  EOF
  ) | socat STDIO UNIX:/tmp/qemu.qmp
  {"QMP": {"version": {"qemu": {"micro": 50, "minor": 1, "major": 4}, "package": "v4.1.0-733-g89ea03a7dc"}, "capabilities": ["oob"]}}
  {"return": {}}

QEMU crashes:

  $ gdb qemu-system-x86_64 core
  Program received signal SIGSEGV, Segmentation fault.
  (gdb) bt
  #0  0x00007ffff510957f in raise () at /lib64/libc.so.6
  #1  0x00007ffff50f3895 in abort () at /lib64/libc.so.6
  #2  0x00007ffff50f3769 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
  #3  0x00007ffff5101a26 in .annobin_assert.c_end () at /lib64/libc.so.6
  #4  0x0000555555d7e1f1 in qmp_blockdev_create (job_id=0x555556baee40 "x", options=0x555557666610, errp=0x7fffffffc770) at block/create.c:69
  #5  0x0000555555c96b52 in qmp_marshal_blockdev_create (args=0x7fffdc003830, ret=0x7fffffffc7f8, errp=0x7fffffffc7f0) at qapi/qapi-commands-block-core.c:1314
  #6  0x0000555555deb0a0 in do_qmp_dispatch (cmds=0x55555645de70 <qmp_commands>, request=0x7fffdc005c70, allow_oob=false, errp=0x7fffffffc898) at qapi/qmp-dispatch.c:131
  #7  0x0000555555deb2a1 in qmp_dispatch (cmds=0x55555645de70 <qmp_commands>, request=0x7fffdc005c70, allow_oob=false) at qapi/qmp-dispatch.c:174

With this patch applied, QEMU returns a QMP error:

  {'execute': 'blockdev-create', 'arguments': {'job-id': 'x', 'options': {'size': 0, 'driver': 'nfs', 'location': {'path': '/', 'server': {'host': '::1', 'type': 'inet'}}}}, 'id': 'x'}
  {"id": "x", "error": {"class": "GenericError", "desc": "Block driver 'nfs' not found or not supported"}}

Cc: qemu-stable@nongnu.org
Reported-by: Xu Tian <xutian@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-09-13 12:18:37 +02:00
Peter Lieven d2c6becbe0 block/nfs: add support for nfs_umount
libnfs recently added support for unmounting. Add support
in Qemu too.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-09-13 12:18:15 +02:00
Peter Lieven 601dc65597 block/nfs: tear down aio before nfs_close
nfs_close is a sync call from libnfs and has its own event
handler polling on the nfs FD. Avoid that both QEMU and libnfs
are intefering here.

CC: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-09-13 12:18:14 +02:00
Max Reitz 1a37e31244 vpc: Return 0 from vpc_co_create() on success
blockdev_create_run() directly uses .bdrv_co_create()'s return value as
the job's return value.  Jobs must return 0 on success, not just any
nonnegative value.  Therefore, using blockdev-create for VPC images may
currently fail as the vpc driver may return a positive integer.

Because there is no point in returning a positive integer anywhere in
the block layer (all non-negative integers are generally treated as
complete success), we probably do not want to add more such cases.
Therefore, fix this problem by making the vpc driver always return 0 in
case of success.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-09-10 08:58:43 +02:00
Kevin Wolf effecce6bc file-posix: Fix has_write_zeroes after NO_FALLBACK
If QEMU_AIO_NO_FALLBACK is given, we always return failure and don't
even try to use the BLKZEROOUT ioctl. In this failure case, we shouldn't
disable has_write_zeroes because we didn't learn anything about the
ioctl. The next request might not set QEMU_AIO_NO_FALLBACK and we can
still use the ioctl then.

Fixes: 738301e117
Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2019-09-10 08:58:43 +02:00
Max Reitz b2c6f23f4a block/file-posix: Reduce xfsctl() use
This patch removes xfs_write_zeroes() and xfs_discard().  Both functions
have been added just before the same feature was present through
fallocate():

- fallocate() has supported PUNCH_HOLE for XFS since Linux 2.6.38 (March
  2011); xfs_discard() was added in December 2010.

- fallocate() has supported ZERO_RANGE for XFS since Linux 3.15 (June
  2014); xfs_write_zeroes() was added in November 2013.

Nowadays, all systems that qemu runs on should support both fallocate()
features (RHEL 7's kernel does).

xfsctl() is still useful for getting the request alignment for O_DIRECT,
so this patch does not remove our dependency on it completely.

Note that xfs_write_zeroes() had a bug: It calls ftruncate() when the
file is shorter than the specified range (because ZERO_RANGE does not
increase the file length).  ftruncate() may yield and then discard data
that parallel write requests have written past the EOF in the meantime.
Dropping the function altogether fixes the bug.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: 50ba5b2d99
Reported-by: Lukáš Doktor <ldoktor@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Tested-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-09-10 08:58:43 +02:00
Vladimir Sementsov-Ogievskiy bb0c940993 job: drop job_drain
In job_finish_sync job_enter should be enough for a job to make some
progress and draining is a wrong tool for it. So use job_enter directly
here and drop job_drain with all related staff not used more.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Tested-by: John Snow <jsnow@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-09-10 08:58:43 +02:00
Alberto Garcia b70d08205b qcow2: Fix the calculation of the maximum L2 cache size
The size of the qcow2 L2 cache defaults to 32 MB, which can be easily
larger than the maximum amount of L2 metadata that the image can have.
For example: with 64 KB clusters the user would need a qcow2 image
with a virtual size of 256 GB in order to have 32 MB of L2 metadata.

Because of that, since commit b749562d98
we forbid the L2 cache to become larger than the maximum amount of L2
metadata for the image, calculated using this formula:

    uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);

The problem with this formula is that the result should be rounded up
to the cluster size because an L2 table on disk always takes one full
cluster.

For example, a 1280 MB qcow2 image with 64 KB clusters needs exactly
160 KB of L2 metadata, but we need 192 KB on disk (3 clusters) even if
the last 32 KB of those are not going to be used.

However QEMU rounds the numbers down and only creates 2 cache tables
(128 KB), which is not enough for the image.

A quick test doing 4KB random writes on a 1280 MB image gives me
around 500 IOPS, while with the correct cache size I get 16K IOPS.

Cc: qemu-stable@nongnu.org
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-09-10 08:58:43 +02:00
Eric Blake f061656cc3 nbd: Implement client use of NBD FAST_ZERO
The client side is fairly straightforward: if the server advertised
fast zero support, then we can map that to BDRV_REQ_NO_FALLBACK
support.  A server that advertises FAST_ZERO but not WRITE_ZEROES
is technically broken, but we can ignore that situation as it does
not change our behavior.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-09-05 16:03:26 -05:00
Andrey Shinkevich 294682cc3a block: workaround for unaligned byte range in fallocate()
Revert the commit 118f99442d 'block/io.c: fix for the allocation failure'
and use better error handling for file systems that do not support
fallocate() for an unaligned byte range. Allow falling back to pwrite
in case fallocate() returns EINVAL.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-Id: <1566913973-15490-1-git-send-email-andrey.shinkevich@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-09-05 16:01:31 -05:00
Eric Blake df18c04edf nbd: Use g_autofree in a few places
Thanks to our recent move to use glib's g_autofree, I can join the
bandwagon.  Getting rid of gotos is fun ;)

There are probably more places where we could register cleanup
functions and get rid of more gotos; this patch just focuses on the
labels that existed merely to call g_free.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190824172813.29720-2-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-09-05 15:52:45 -05:00
Stefan Hajnoczi 236094c738 file-posix: fix request_alignment typo
Fixes: a6b257a08e
       ("file-posix: Handle undetectable alignment")
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190827101328.4062-1-stefanha@redhat.com
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-03 14:55:35 +02:00
Max Reitz bedb8bb419 vmdk: Reject invalid compressed writes
Compressed writes generally have to write full clusters, not just in
theory but also in practice when it comes to vmdk's streamOptimized
subformat.  It currently is just silently broken for writes with
non-zero in-cluster offsets:

$ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
$ qemu-io -c 'write 4k 4k' -c 'read 4k 4k' foo.vmdk
wrote 4096/4096 bytes at offset 4096
4 KiB, 1 ops; 00.01 sec (443.724 KiB/sec and 110.9309 ops/sec)
read failed: Invalid argument

(The technical reason is that vmdk_write_extent() just writes the
incomplete compressed data actually to offset 4k.  When reading the
data, vmdk_read_extent() looks at offset 0 and finds the compressed data
size to be 0, because that is what it reads from there.  This yields an
error.)

For incomplete writes with zero in-cluster offsets, the error path when
reading the rest of the cluster is a bit different, but the result is
the same:

$ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
$ qemu-io -c 'write 0k 4k' -c 'read 4k 4k' foo.vmdk
wrote 4096/4096 bytes at offset 0
4 KiB, 1 ops; 00.01 sec (362.641 KiB/sec and 90.6603 ops/sec)
read failed: Invalid argument

(Here, vmdk_read_extent() finds the data and then sees that the
uncompressed data is short.)

It is better to reject invalid writes than to make the user believe they
might have succeeded and then fail when trying to read it back.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190815153638.4600-5-mreitz@redhat.com
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-03 14:55:35 +02:00
Max Reitz cdc0dd2586 vmdk: Use bdrv_dirname() for relative extent paths
This makes iotest 033 pass with e.g. subformat=monolithicFlat.  It also
turns a former error in 059 into success.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190815153638.4600-3-mreitz@redhat.com
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-03 14:55:35 +02:00
Nir Soffer 3a20013fbb block: posix: Always allocate the first block
When creating an image with preallocation "off" or "falloc", the first
block of the image is typically not allocated. When using Gluster
storage backed by XFS filesystem, reading this block using direct I/O
succeeds regardless of request length, fooling alignment detection.

In this case we fallback to a safe value (4096) instead of the optimal
value (512), which may lead to unneeded data copying when aligning
requests.  Allocating the first block avoids the fallback.

Since we allocate the first block even with preallocation=off, we no
longer create images with zero disk size:

    $ ./qemu-img create -f raw test.raw 1g
    Formatting 'test.raw', fmt=raw size=1073741824

    $ ls -lhs test.raw
    4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw

And converting the image requires additional cluster:

    $ ./qemu-img measure -f raw -O qcow2 test.raw
    required size: 458752
    fully allocated size: 1074135040

When using format like vmdk with multiple files per image, we allocate
one block per file:

    $ ./qemu-img create -f vmdk -o subformat=twoGbMaxExtentFlat test.vmdk 4g
    Formatting 'test.vmdk', fmt=vmdk size=4294967296 compat6=off hwversion=undefined subformat=twoGbMaxExtentFlat

    $ ls -lhs test*.vmdk
    4.0K -rw-r--r--. 1 nsoffer nsoffer 2.0G Aug 27 03:23 test-f001.vmdk
    4.0K -rw-r--r--. 1 nsoffer nsoffer 2.0G Aug 27 03:23 test-f002.vmdk
    4.0K -rw-r--r--. 1 nsoffer nsoffer  353 Aug 27 03:23 test.vmdk

I did quick performance test for copying disks with qemu-img convert to
new raw target image to Gluster storage with sector size of 512 bytes:

    for i in $(seq 10); do
        rm -f dst.raw
        sleep 10
        time ./qemu-img convert -f raw -O raw -t none -T none src.raw dst.raw
    done

Here is a table comparing the total time spent:

Type    Before(s)   After(s)    Diff(%)
---------------------------------------
real      530.028    469.123      -11.4
user       17.204     10.768      -37.4
sys        17.881      7.011      -60.7

We can see very clear improvement in CPU usage.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Message-id: 20190827010528.8818-2-nsoffer@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-03 14:55:35 +02:00
Vladimir Sementsov-Ogievskiy 5396234b96 block/qcow2: implement .bdrv_co_pwritev(_compressed)_part
Implement and use new interface to get rid of hd_qiov.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-13-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-13-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy df893d25ce block/qcow2: implement .bdrv_co_preadv_part
Implement and use new interface to get rid of hd_qiov.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-12-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-12-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy 00721a3529 block/qcow2: refactor qcow2_co_preadv to use buffer-based io
Use buffer based io in encrypted case.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-11-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-11-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy 1acc3466a2 block/io: introduce bdrv_co_p{read, write}v_part
Introduce extended variants of bdrv_co_preadv and bdrv_co_pwritev
with qiov_offset parameter.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-10-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-10-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy 28c4da2869 block/io: bdrv_aligned_pwritev: use and support qiov_offset
Use and support new API in bdrv_aligned_pwritev.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-9-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-9-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy 65cd4424b9 block/io: bdrv_aligned_preadv: use and support qiov_offset
Use and support new API in bdrv_co_do_copy_on_readv.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-8-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-8-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy 2275cc90a1 block/io: bdrv_co_do_copy_on_readv: lazy allocation
Allocate bounce_buffer only if it is really needed. Also, sub-optimize
allocation size (why not?).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-7-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-7-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy 1143ec5ebf block/io: bdrv_co_do_copy_on_readv: use and support qiov_offset
Use and support new API in bdrv_co_do_copy_on_readv. Note that in case
of allocated-in-top we need to shrink read size to MIN(..) by hand, as
pre-patch this was actually done implicitly by qemu_iovec_concat (and
we used local_qiov.size).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-6-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-6-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy ac850bf099 block: define .*_part io handlers in BlockDriver
Add handlers supporting qiov_offset parameter:
    bdrv_co_preadv_part
    bdrv_co_pwritev_part
    bdrv_co_pwritev_compressed_part
This is used to reduce need of defining local_qiovs and hd_qiovs in all
corners of block layer code. The following patches will increase usage
of this new API part by part.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-5-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-5-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy 7a3f542fbd block/io: refactor padding
We have similar padding code in bdrv_co_pwritev,
bdrv_co_do_pwrite_zeroes and bdrv_co_preadv. Let's combine and unify
it.

[Squashed in Vladimir's qemu-iotests 077 fix
--Stefan]

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-4-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-4-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-08-27 14:58:12 +01:00
Vladimir Sementsov-Ogievskiy f76889e7b9 util/iov: improve qemu_iovec_is_zero
We'll need to check a part of qiov soon, so implement it now.

Optimization with align down to 4 * sizeof(long) is dropped due to:
1. It is strange: it aligns length of the buffer, but where is a
   guarantee that buffer pointer is aligned itself?
2. buffer_is_zero() is a better place for optimizations and it has
   them.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-3-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-3-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-08-27 14:52:45 +01:00
Max Reitz fbc8e1b7e4 vpc: Do not return RAW from block_status
vpc is not really a passthrough driver, even when using the fixed
subformat (where host and guest offsets are equal).  It should handle
preallocation like all other drivers do, namely by returning
DATA | RECURSE instead of RAW.

There is no tangible difference but the fact that bdrv_is_allocated() no
longer falls through to the protocol layer.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190725155512.9827-4-mreitz@redhat.com
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-08-19 17:13:26 +02:00
Max Reitz 4dd84ac9a7 vmdk: Make block_status recurse for flat extents
Fixes: 69f47505ee
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190725155512.9827-3-mreitz@redhat.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-08-19 17:13:26 +02:00
Max Reitz ad6434dc62 vdi: Make block_status recurse for fixed images
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Fixes: 69f47505ee
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190725155512.9827-2-mreitz@redhat.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-08-19 17:13:26 +02:00
Max Reitz 9956688a8f vhdx: Fix .bdrv_has_zero_init()
Fixed VHDX images cannot guarantee to be zero-initialized.  If the image
has the "fixed" subformat, forward the call to the underlying storage
node.

Reported-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190724171239.8764-9-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-08-19 17:13:26 +02:00
Max Reitz 0a28bf2826 vdi: Fix .bdrv_has_zero_init()
Static VDI images cannot guarantee to be zero-initialized.  If the image
has been statically allocated, forward the call to the underlying
storage node.

Reported-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20190724171239.8764-8-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-08-19 17:13:26 +02:00
Max Reitz 38841dcd27 qcow2: Fix .bdrv_has_zero_init()
If a qcow2 file is preallocated, it can no longer guarantee that it
initially appears as filled with zeroes.

So implement .bdrv_has_zero_init() by checking whether the file is
preallocated; if so, forward the call to the underlying storage node,
except for when it is encrypted: Encrypted preallocated images always
return effectively random data, so .bdrv_has_zero_init() must always
return 0 for them.

.bdrv_has_zero_init_truncate() can remain bdrv_has_zero_init_1(),
because it presupposes PREALLOC_MODE_OFF.

Reported-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190724171239.8764-7-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-08-19 17:13:26 +02:00
Max Reitz b647d69adc block: Use bdrv_has_zero_init_truncate()
vhdx and parallels call bdrv_has_zero_init() when they do not really
care about an image's post-create state but only about what happens when
you grow an image.  That is a bit ugly, and also overly safe when
growing preallocated images without preallocating the new areas.

Let them use bdrv_has_zero_init_truncate() instead.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190724171239.8764-6-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
[mreitz: Added commit message]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-08-19 17:13:26 +02:00
Max Reitz 1dcaf52760 block: Implement .bdrv_has_zero_init_truncate()
We need to implement .bdrv_has_zero_init_truncate() for every block
driver that supports truncation and has a .bdrv_has_zero_init()
implementation.

Implement it the same way each driver implements .bdrv_has_zero_init().
This is at least not any more unsafe than what we had before.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190724171239.8764-5-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-08-19 17:13:26 +02:00
Max Reitz cdf3bc934a mirror: Fix bdrv_has_zero_init() use
bdrv_has_zero_init() only has meaning for newly created images or image
areas.  If the mirror job itself did not create the image, it cannot
rely on bdrv_has_zero_init()'s result to carry any meaning.

This is the case for drive-mirror with mode=existing and always for
blockdev-mirror.

Note that we only have to zero-initialize the target with sync=full,
because other modes actually do not promise that the target will contain
the same data as the source after the job -- sync=top only promises to
copy anything allocated in the top layer, and sync=none will only copy
new I/O.  (Which is how mirror has always handled it.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190724171239.8764-3-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-08-19 17:13:26 +02:00
Maxim Levitsky 672de729a1 LUKS: support preallocation
preallocation=off and preallocation=metadata
both allocate luks header only, and preallocation=falloc/full
is passed to underlying file.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-id: 20190716161901.1430-1-mlevitsk@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-08-19 17:13:26 +02:00
Peter Maydell 3fbd3405d2 - Run the iotest during "make check"
-----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAl1XvtURHHRodXRoQHJl
 ZGhhdC5jb20ACgkQLtnXdP5wLbUqqRAAqfjPRPtXSZxaxl63O5EADugLY72s+04i
 Zc5MC4Ivwj0x1WA6JIG77kz6xmjIOizRIkF1jkyEkG+AgIrjw3rdYhCD4Iav0n/v
 nqltkOaf1FzdQYCHUTn0WUYn7Df2bSkjSTPhnqbCaGq5WjXzgzi9jhCFlZpo374J
 9yWk74nt3QlOUjLw6+cm0HxEf9IlRQdPwJNYJwsrYHgspJwcwAYJ7xaL34huoxkJ
 10fA9q6QK1bh67nZpAJOte3wQ8r35cUT4ZaIiyO0MFMrEiLp4/1gKYpkZwWq0+iV
 25rWVjogzRjp+LejMAltY9MmUekCl5ZzVVuhdt2jGPbNanzdHxHydYFUELP6WmrM
 zAyYQkDvG7JiY2F0M9rKJZOnpO2pYF2hxc/nqD04qF3HD0zG1eoIg056UlKcc5b/
 kIgR2srlj0aHWhGJ2/DV3w5ZowjJGKBAYYxQdEJmiuLpGdOimSoWXbjacDEd4mUf
 DevXv6k9cAvexZxU4cOUzpip4U3MGC0rJ1BNIgTs6eIeKq3geROTpuHjJFSBHZiP
 H/MqtoT4xTXJYomc9MiZG90fG9KyywlEF6e0GjVIcadJEmFIbJ+DfrAknVRjuaij
 ThMSIuvMEpXFhyghlApURePNi8W3FIIYHISw0JE/u7+4/7L/iYDToYeM/o563+8O
 zbj0n9fSewI=
 =oVZx
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/huth-gitlab/tags/pull-request-2019-08-17' into staging

- Run the iotest during "make check"

# gpg: Signature made Sat 17 Aug 2019 09:46:13 BST
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg:                 aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>" [full]
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* remotes/huth-gitlab/tags/pull-request-2019-08-17:
  gitlab-ci: Remove qcow2 tests that are handled by "make check" already
  tests: Run the iotests during "make check" again
  block: fix NetBSD qemu-iotests failure

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-08-19 14:14:09 +01:00
Paolo Bonzini f6fc1e30cf block: fix NetBSD qemu-iotests failure
Opening a block device on NetBSD has an additional step compared to other OSes,
corresponding to raw_normalize_devicepath.  The error message in that function
is slightly different from that in raw_open_common and this was causing spurious
failures in qemu-iotests.  However, in general it is not important to know what
exact step was failing, for example in the qemu-iotests case the error message
contains the fairly unequivocal "No such file or directory" text from strerror.
We can thus fix the failures by standardizing on a single error message for
both raw_open_common and raw_normalize_devicepath; in fact, we can even
use error_setg_file_open to make sure the error message is the same as in
the rest of QEMU.

Message-Id: <20190725095920.28419-1-pbonzini@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2019-08-17 09:02:59 +02:00
Vladimir Sementsov-Ogievskiy a1ed82b443 block/backup: refactor write_flags
write flags are constant, let's store it in BackupBlockJob instead of
recalculating. It also makes two boolean fields to be unused, so,
drop them.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190730163251.755248-4-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-08-16 18:29:43 -04:00
Vladimir Sementsov-Ogievskiy 319bd5edb9 block/backup: deal with zero detection
We have detect_zeroes option, so at least for blockdev-backup user
should define it if zero-detection is needed. For drive-backup leave
detection enabled by default but do it through existing option instead
of open-coding.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190730163251.755248-2-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-08-16 18:29:43 -04:00
Vladimir Sementsov-Ogievskiy 590a63d598 qapi: add dirty-bitmaps to query-named-block-nodes result
Let's add a possibility to query dirty-bitmaps not only on root nodes.
It is useful when dealing both with snapshots and incremental backups.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 20190717173937.18747-1-jsnow@redhat.com
[Added deprecation information. --js]
Signed-off-by: John Snow <jsnow@redhat.com>
[Fixed spelling --js]
2019-08-16 18:29:43 -04:00
John Snow 1a2b8b406b block/backup: support bitmap sync modes for non-bitmap backups
Accept bitmaps and sync policies for the other backup modes.
This allows us to do things like create a bitmap synced to a full backup
without a transaction, or start a resumable backup process.

Some combinations don't make sense, though:

- NEVER policy combined with any non-BITMAP mode doesn't do anything,
  because the bitmap isn't used for input or output.
  It's harmless, but is almost certainly never what the user wanted.

- sync=NONE is more questionable. It can't use on-success because this
  job never completes with success anyway, and the resulting artifact
  of 'always' is suspect: because we start with a full bitmap and only
  copy out segments that get written to, the final output bitmap will
  always be ... a fully set bitmap.

  Maybe there's contexts in which bitmaps make sense for sync=none,
  but not without more severe changes to the current job, and omitting
  it here doesn't prevent us from adding it later.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190716000117.25219-11-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-08-16 18:29:43 -04:00
John Snow 7e30dd618e block/backup: teach TOP to never copy unallocated regions
Presently, If sync=TOP is selected, we mark the entire bitmap as dirty.
In the write notifier handler, we dutifully copy out such regions.

Fix this in three parts:

1. Mark the bitmap as being initialized before the first yield.
2. After the first yield but before the backup loop, interrogate the
allocation status asynchronously and initialize the bitmap.
3. Teach the write notifier to interrogate allocation status if it is
invoked during bitmap initialization.

As an effect of this patch, the job progress for TOP backups
now behaves like this:

- total progress starts at bdrv_length.
- As allocation status is interrogated, total progress decreases.
- As blocks are copied, current progress increases.

Taken together, the floor and ceiling move to meet each other.


Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 20190716000117.25219-10-jsnow@redhat.com
[Remove ret = -ECANCELED change. --js]
[Squash in conflict resolution based on Max's patch --js]
Message-id: c8b0ab36-79c8-0b4b-3193-4e12ed8c848b@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
2019-08-16 16:28:03 -04:00
John Snow dba8700f16 block/backup: add backup_is_cluster_allocated
Modify bdrv_is_unallocated_range to utilize the pnum return from
bdrv_is_allocated, and in the process change the semantics from
"is unallocated" to "is allocated."

Optionally returns a full number of clusters that share the same
allocation status.

This will be used to carefully toggle bits in the bitmap for sync=top
initialization in the following commits.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190716000117.25219-9-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-08-16 16:28:03 -04:00
John Snow 141cdcdf84 block/backup: centralize copy_bitmap initialization
Just a few housekeeping changes that keeps the following commit easier
to read; perform the initial copy_bitmap initialization in one place.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190716000117.25219-8-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-08-16 16:28:03 -04:00
John Snow 0fff1f1371 block/backup: improve sync=bitmap work estimates
When making backups based on bitmaps, the work estimate can be more
accurate. Update iotests to reflect the new strategy.

TOP work estimates are broken, but do not get worse with this commit.
That issue is addressed in the following commits instead.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190716000117.25219-7-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-08-16 16:28:03 -04:00
John Snow a6c9365ad4 block/backup: hoist bitmap check into QMP interface
This is nicer to do in the unified QMP interface that we have now,
because it lets us use the right terminology back at the user.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190716000117.25219-5-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-08-16 16:28:03 -04:00
John Snow c4e4b0fa59 qapi: implement block-dirty-bitmap-remove transaction action
It is used to do transactional movement of the bitmap (which is
possible in conjunction with merge command). Transactional bitmap
movement is needed in scenarios with external snapshot, when we don't
want to leave copy of the bitmap in the base image.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190708220502.12977-3-jsnow@redhat.com
[Edited "since" version to 4.2 --js]
Signed-off-by: John Snow <jsnow@redhat.com>
2019-08-16 16:28:03 -04:00
John Snow b30ffbef53 block/backup: loosen restriction on readonly bitmaps
With the "never" sync policy, we actually can utilize readonly bitmaps
now. Loosen the check at the QMP level, and tighten it based on
provided arguments down at the job creation level instead.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190709232550.10724-19-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-08-16 16:28:02 -04:00
John Snow c23909e530 block/backup: add 'always' bitmap sync policy
This adds an "always" policy for bitmap synchronization. Regardless of if
the job succeeds or fails, the bitmap is *always* synchronized. This means
that for backups that fail part-way through, the bitmap retains a record of
which sectors need to be copied out to accomplish a new backup using the
old, partial result.

In effect, this allows us to "resume" a failed backup; however the new backup
will be from the new point in time, so it isn't a "resume" as much as it is
an "incremental retry." This can be useful in the case of extremely large
backups that fail considerably through the operation and we'd like to not waste
the work that was already performed.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190709232550.10724-13-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-08-16 16:28:02 -04:00
John Snow 62aa1fbeac block/backup: upgrade copy_bitmap to BdrvDirtyBitmap
This simplifies some interface matters; namely the initialization and
(later) merging the manifest back into the sync_bitmap if it was
provided.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190709232550.10724-12-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-08-16 16:28:02 -04:00
John Snow 28636b8211 block/dirty-bitmap: add bdrv_dirty_bitmap_get
Add a public interface for get. While we're at it,
rename "bdrv_get_dirty_bitmap_locked" to "bdrv_dirty_bitmap_get_locked".

(There are more functions to rename to the bdrv_dirty_bitmap_VERB form,
but they will wait until the conclusion of this series.)

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190709232550.10724-11-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-08-16 16:28:02 -04:00
John Snow b7661ca5d8 block/dirty-bitmap: add bdrv_dirty_bitmap_merge_internal
I'm surprised it didn't come up sooner, but sometimes we have a +busy
bitmap as a source. This is dangerous from the QMP API, but if we are
the owner that marked the bitmap busy, it's safe to merge it using it as
a read only source.

It is not safe in the general case to allow users to read from in-use
bitmaps, so create an internal variant that foregoes the safety
checking.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190709232550.10724-10-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-08-16 16:28:02 -04:00
John Snow cf0cd293c6 block/backup: add 'never' policy to bitmap sync mode
This adds a "never" policy for bitmap synchronization. Regardless of if
the job succeeds or fails, we never update the bitmap. This can be used
to perform differential backups, or simply to avoid the job modifying a
bitmap.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190709232550.10724-7-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-08-16 16:28:02 -04:00
John Snow c8b5650178 block/backup: Add mirror sync mode 'bitmap'
We don't need or want a new sync mode for simple differences in
semantics.  Create a new mode simply named "BITMAP" that is designed to
make use of the new Bitmap Sync Mode field.

Because the only bitmap sync mode is 'on-success', this adds no new
functionality to the backup job (yet). The old incremental backup mode
is maintained as a syntactic sugar for sync=bitmap, mode=on-success.

Add all of the plumbing necessary to support this new instruction.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190709232550.10724-6-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-08-16 16:28:02 -04:00
Peter Maydell e018ccb3fb Block layer patches:
- file-posix: Fix O_DIRECT alignment detection
 - Fixes for concurrent block jobs
 - block-backend: Queue requests while drained (fix IDE vs. job crashes)
 - qemu-img convert: Deprecate using -n and -o together
 - iotests: Migration tests with filter nodes
 - iotests: More media change tests
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJdVnduAAoJEH8JsnLIjy/W0IgQAKft/M3aDgt0sbTzQh8vdy6A
 yAfTnnSL4Z56+8qAsqhEnplC3rZxvTkg9AGOoNYHOZKl3FgRH9r8g9/Enemh4fWu
 MH52hiRf2ytlFVurIQal3aj9O+i0YTnzuvYbysvkH4ID5zbv2QnwdagtEcBxbbYL
 NZTMZBynDzp4rKIZ7p6T/kkaklLHh4vZrjW+Mzm3LQx9JJr8TwVNqqetSfc4VKIJ
 ByaNbbihDUVjQyIaJ24DXXJdzonGrrtSbSZycturc5FzXymzSRgrXZCeSKCs8X+i
 fjwMXH5v4/UfK511ILsXiumeuxBfD2Ck4sAblFxVo06oMPRNmsAKdRLeDByE7IC1
 lWep/pB3y/au9CW2/pkWJOiaz5s5iuv2fFYidKUJ0KQ1dD7G8M9rzkQlV3FUmTZO
 jBKSxHEffXsYl0ojn0vGmZEd7FAPi3fsZibGGws1dVgxlWI93aUJsjCq0E+lHIRD
 hEmQcjqZZa4taKpj0Y3Me05GkL7tH6RYA153jDNb8rPdzriGRCLZSObEISrOJf8H
 Mh0gTLi8KJNh6bULd12Ake1tKn7ZeTXpHH+gadz9OU7eIModh1qYTSHPlhy5oAv0
 Hm9BikNlS1Hzw+a+EbLcOW7TrsteNeGr7r8T6QKPMq1sfsYcp3svbC2c+zVlQ6Ll
 mLoTssksXOkgBevVqSiS
 =T7L5
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- file-posix: Fix O_DIRECT alignment detection
- Fixes for concurrent block jobs
- block-backend: Queue requests while drained (fix IDE vs. job crashes)
- qemu-img convert: Deprecate using -n and -o together
- iotests: Migration tests with filter nodes
- iotests: More media change tests

# gpg: Signature made Fri 16 Aug 2019 10:29:18 BST
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  file-posix: Handle undetectable alignment
  qemu-img convert: Deprecate using -n and -o together
  block-backend: Queue requests while drained
  mirror: Keep mirror_top_bs drained after dropping permissions
  block: Remove blk_pread_unthrottled()
  iotests: Add test for concurrent stream/commit
  tests: Test mid-drain bdrv_replace_child_noperm()
  tests: Test polling in bdrv_drop_intermediate()
  block: Reduce (un)drains when replacing a child
  block: Keep subtree drained in drop_intermediate
  block: Simplify bdrv_filter_default_perms()
  iotests: Test migration with all kinds of filter nodes
  iotests: Move migration helpers to iotests.py
  iotests/118: Add -blockdev based tests
  iotests/118: Create test classes dynamically
  iotests/118: Test media change for scsi-cd

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-08-16 16:43:46 +01:00
Peter Maydell c6a2225a5a nbd patches for 2019-08-15
- Addition of InetSocketAddress keep-alive
 - Addition of BDRV_REQ_PREFETCH for more efficient copy-on-read
 - Initial refactoring in preparation of NBD reconnect
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJdVaRZAAoJEKeha0olJ0NqrGoIAJSvVLMDeWZIkHr3CQ5AbMHy
 6IHUntBwv4PEHw0FyyDU7lLgEWubTwe/7RfvyJ69kQYSJLjvHa3KEic0aa7SOETK
 hGUlSoIFHEugi+XDcYyy9EG+ItUR7jnunkwomxvFRm4XzjEHFO9ck8fOS+uq/23e
 LGDHwdoZI6vawUPftbBuRAlB3egCEcBtTWXYMk8lm3MXHOHL7O18DRkfWvwcHfl6
 mNIKgTVMtl1gYoJznCUmC5VLHL4jQy+kSNXnyHBQOEEvTcORu0EztJS81H+BODni
 sxa9seem7JL9NLUTmkJsbGfSM6RKdfypX34oik9yakqUnXRrlxkxI+IX26XfdQ4=
 =2MAO
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2019-08-15' into staging

nbd patches for 2019-08-15

- Addition of InetSocketAddress keep-alive
- Addition of BDRV_REQ_PREFETCH for more efficient copy-on-read
- Initial refactoring in preparation of NBD reconnect

# gpg: Signature made Thu 15 Aug 2019 19:28:41 BST
# gpg:                using RSA key A7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg:                 aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2019-08-15:
  block/nbd: refactor nbd connection parameters
  block/nbd: add cmdline and qapi parameter reconnect-delay
  block/nbd: move from quit to state
  block/nbd: use non-blocking io channel for nbd negotiation
  block/nbd: split connection_co start out of nbd_client_connect
  nbd: improve CMD_CACHE: use BDRV_REQ_PREFETCH
  block/stream: use BDRV_REQ_PREFETCH
  block: implement BDRV_REQ_PREFETCH
  qapi: Add InetSocketAddress member keep-alive

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-08-16 15:53:37 +01:00
Markus Armbruster 54d31236b9 sysemu: Split sysemu/runstate.h off sysemu/sysemu.h
sysemu/sysemu.h is a rather unfocused dumping ground for stuff related
to the system-emulator.  Evidence:

* It's included widely: in my "build everything" tree, changing
  sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600
  objects (not counting tests and objects that don't depend on
  qemu/osdep.h, down from 5400 due to the previous two commits).

* It pulls in more than a dozen additional headers.

Split stuff related to run state management into its own header
sysemu/runstate.h.

Touching sysemu/sysemu.h now recompiles some 850 objects.  qemu/uuid.h
also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400
to 4200.  Touching new sysemu/runstate.h recompiles some 500 objects.

Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also
add qemu/main-loop.h.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-30-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
[Unbreak OS-X build]
2019-08-16 13:37:36 +02:00
Markus Armbruster d5938f29fe Clean up inclusion of sysemu/sysemu.h
In my "build everything" tree, changing sysemu/sysemu.h triggers a
recompile of some 5400 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).

Almost a third of its inclusions are actually superfluous.  Delete
them.  Downgrade two more to qapi/qapi-types-run-state.h, and move one
from char/serial.h to char/serial.c.

hw/semihosting/config.c, monitor/monitor.c, qdev-monitor.c, and
stubs/semihost.c define variables declared in sysemu/sysemu.h without
including it.  The compiler is cool with that, but include it anyway.

This doesn't reduce actual use much, as it's still included into
widely included headers.  The next commit will tackle that.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190812052359.30071-27-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2019-08-16 13:31:53 +02:00
Markus Armbruster db72581598 Include qemu/main-loop.h less
In my "build everything" tree, changing qemu/main-loop.h triggers a
recompile of some 5600 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).  It includes block/aio.h,
which in turn includes qemu/event_notifier.h, qemu/notify.h,
qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h,
qemu/thread.h, qemu/timer.h, and a few more.

Include qemu/main-loop.h only where it's needed.  Touching it now
recompiles only some 1700 objects.  For block/aio.h and
qemu/event_notifier.h, these numbers drop from 5600 to 2800.  For the
others, they shrink only slightly.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-21-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16 13:31:52 +02:00
Markus Armbruster 13d4ff07e8 trace: Do not include qom/cpu.h into generated trace.h
docs/devel/tracing.txt explains "since many source files include
trace.h, [the generated trace.h use] a minimum of types and other
header files included to keep the namespace clean and compile times
and dependencies down."

Commit 4815185902 "trace: Add per-vCPU tracing states for events with
the 'vcpu' property" made them all include qom/cpu.h via
control-internal.h.  qom/cpu.h in turn includes about thirty headers.
Ouch.

Per-vCPU tracing is currently not supported in sub-directories'
trace-events.  In other words, qom/cpu.h can only be used in
trace-root.h, not in any trace.h.

Split trace/control-vcpu.h off trace/control.h and
trace/control-internal.h.  Have the generated trace.h include
trace/control.h (which no longer includes qom/cpu.h), and trace-root.h
include trace/control-vcpu.h (which includes it).

The resulting improvement is a bit disappointing: in my "build
everything" tree, some 1100 out of 6600 objects (not counting tests
and objects that don't depend on qemu/osdep.h) depend on a trace.h,
and about 600 of them no longer depend on qom/cpu.h.  But more than
1300 others depend on trace-root.h.  More work is clearly needed.
Left for another day.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190812052359.30071-8-armbru@redhat.com>
2019-08-16 13:31:52 +02:00
Nir Soffer a6b257a08e file-posix: Handle undetectable alignment
In some cases buf_align or request_alignment cannot be detected:

1. With Gluster, buf_align cannot be detected since the actual I/O is
   done on Gluster server, and qemu buffer alignment does not matter.
   Since we don't have alignment requirement, buf_align=1 is the best
   value.

2. With local XFS filesystem, buf_align cannot be detected if reading
   from unallocated area. In this we must align the buffer, but we don't
   know what is the correct size. Using the wrong alignment results in
   I/O error.

3. With Gluster backed by XFS, request_alignment cannot be detected if
   reading from unallocated area. In this case we need to use the
   correct alignment, and failing to do so results in I/O errors.

4. With NFS, the server does not use direct I/O, so both buf_align cannot
   be detected. In this case we don't need any alignment so we can use
   buf_align=1 and request_alignment=1.

These cases seems to work when storage sector size is 512 bytes, because
the current code starts checking align=512. If the check succeeds
because alignment cannot be detected we use 512. But this does not work
for storage with 4k sector size.

To determine if we can detect the alignment, we probe first with
align=1. If probing succeeds, maybe there are no alignment requirement
(cases 1, 4) or we are probing unallocated area (cases 2, 3). Since we
don't have any way to tell, we treat this as undetectable alignment. If
probing with align=1 fails with EINVAL, but probing with one of the
expected alignments succeeds, we know that we found a working alignment.

Practically the alignment requirements are the same for buffer
alignment, buffer length, and offset in file. So in case we cannot
detect buf_align, we can use request alignment. If we cannot detect
request alignment, we can fallback to a safe value. To use this logic,
we probe first request alignment instead of buf_align.

Here is a table showing the behaviour with current code (the value in
parenthesis is the optimal value).

Case    Sector    buf_align (opt)   request_alignment (opt)     result
======================================================================
1       512       512   (1)          512   (512)                 OK
1       4096      512   (1)          4096  (4096)                FAIL
----------------------------------------------------------------------
2       512       512   (512)        512   (512)                 OK
2       4096      512   (4096)       4096  (4096)                FAIL
----------------------------------------------------------------------
3       512       512   (1)          512   (512)                 OK
3       4096      512   (1)          512   (4096)                FAIL
----------------------------------------------------------------------
4       512       512   (1)          512   (1)                   OK
4       4096      512   (1)          512   (1)                   OK

Same cases with this change:

Case    Sector    buf_align (opt)   request_alignment (opt)     result
======================================================================
1       512       512   (1)          512   (512)                 OK
1       4096      4096  (1)          4096  (4096)                OK
----------------------------------------------------------------------
2       512       512   (512)        512   (512)                 OK
2       4096      4096  (4096)       4096  (4096)                OK
----------------------------------------------------------------------
3       512       4096  (1)          4096  (512)                 OK
3       4096      4096  (1)          4096  (4096)                OK
----------------------------------------------------------------------
4       512       4096  (1)          4096  (1)                   OK
4       4096      4096  (1)          4096  (1)                   OK

I tested that provisioning VMs and copying disks on local XFS and
Gluster with 4k bytes sector size work now, resolving bugs [1],[2].
I tested also on XFS, NFS, Gluster with 512 bytes sector size.

[1] https://bugzilla.redhat.com/1737256
[2] https://bugzilla.redhat.com/1738657

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-08-16 11:29:11 +02:00
Kevin Wolf cf3129323f block-backend: Queue requests while drained
This fixes devices like IDE that can still start new requests from I/O
handlers in the CPU thread while the block backend is drained.

The basic assumption is that in a drain section, no new requests should
be allowed through a BlockBackend (blk_drained_begin/end don't exist,
we get drain sections only on the node level). However, there are two
special cases where requests should not be queued:

1. Block jobs: We already make sure that block jobs are paused in a
   drain section, so they won't start new requests. However, if the
   drain_begin is called on the job's BlockBackend first, it can happen
   that we deadlock because the job stays busy until it reaches a pause
   point - which it can't if its requests aren't processed any more.

   The proper solution here would be to make all requests through the
   job's filter node instead of using a BlockBackend. For now, just
   disabling request queuing on the job BlockBackend is simpler.

2. In test cases where making requests through bdrv_* would be
   cumbersome because we'd need a BdrvChild. As we already got the
   functionality to disable request queuing from 1., use it in tests,
   too, for convenience.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2019-08-16 10:25:16 +02:00
Kevin Wolf d2da5e288a mirror: Keep mirror_top_bs drained after dropping permissions
mirror_top_bs is currently implicitly drained through its connection to
the source or the target node. However, the drain section for target_bs
ends early after moving mirror_top_bs from src to target_bs, so that
requests can already be restarted while mirror_top_bs is still present
in the chain, but has dropped all permissions and therefore runs into an
assertion failure like this:

    qemu-system-x86_64: block/io.c:1634: bdrv_co_write_req_prepare:
    Assertion `child->perm & BLK_PERM_WRITE' failed.

Keep mirror_top_bs drained until all graph changes have completed.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2019-08-16 10:25:16 +02:00
Kevin Wolf 421919d76b block: Remove blk_pread_unthrottled()
The functionality offered by blk_pread_unthrottled() goes back to commit
498e386c58. Then, we couldn't perform I/O throttling with synchronous
requests because timers wouldn't be executed in polling loops. So the
commit automatically disabled I/O throttling as soon as a synchronous
request was issued.

However, for geometry detection during disk initialisation, we always
used (and still use) synchronous requests even if guest requests use AIO
later. Geometry detection was not wanted to disable I/O throttling, so
bdrv_pread_unthrottled() was introduced which disabled throttling only
temporarily.

All of this isn't necessary any more because we do run timers in polling
loop and even synchronous requests are now using coroutine
infrastructure internally. For this reason, commit 90c78624f already
removed the automatic disabling of I/O throttling.

It's time to get rid of the workaround for the removed code, and its
abuse of blk_root_drained_begin()/end(), as well.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2019-08-16 10:25:16 +02:00
Vladimir Sementsov-Ogievskiy 8f071c9db5 block/nbd: refactor nbd connection parameters
We'll need some connection parameters to be available all the time to
implement nbd reconnect. So, let's refactor them: define additional
parameters in BDRVNBDState, drop them from function parameters, drop
nbd_client_init and separate options parsing instead from nbd_open.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190618114328.55249-6-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: Drop useless 'if' before object_unref]
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-08-15 13:22:14 -05:00
Vladimir Sementsov-Ogievskiy b172ae2e0e block/nbd: add cmdline and qapi parameter reconnect-delay
Reconnect will be implemented in the following commit, so for now,
in semantics below, disconnect itself is a "serious error".

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190618114328.55249-5-vsementsov@virtuozzo.com>
[eblake: slipped from 4.1 to 4.2]
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-08-15 13:22:14 -05:00
Vladimir Sementsov-Ogievskiy a34b1e5e06 block/nbd: move from quit to state
To implement reconnect we need several states for the client:
CONNECTED, QUIT and two different CONNECTING states. CONNECTING states
will be added in the following patches. This patch implements CONNECTED
and QUIT.

QUIT means, that we should close the connection and fail all current
and further requests (like old quit = true).

CONNECTED means that connection is ok, we can send requests (like old
quit = false).

For receiving loop we use a comparison of the current state with QUIT,
because reconnect will be in the same loop, so it should be looping
until the end.

Opposite, for requests we use a comparison of the current state with
CONNECTED, as we don't want to send requests in future CONNECTING
states.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190618114328.55249-4-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-08-15 13:22:14 -05:00
Vladimir Sementsov-Ogievskiy a8e2bb6a76 block/nbd: use non-blocking io channel for nbd negotiation
No reason to use blocking channel for negotiation and we'll benefit in
further reconnect feature, as qio_channel reads and writes will do
qemu_coroutine_yield while waiting for io completion.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190618114328.55249-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-08-15 13:22:14 -05:00
Vladimir Sementsov-Ogievskiy 962b7b3d4c block/nbd: split connection_co start out of nbd_client_connect
nbd_client_connect is going to be used from connection_co, so, let's
refactor nbd_client_connect in advance, leaving io channel
configuration all in nbd_client_connect.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190618114328.55249-2-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-08-15 13:22:13 -05:00
Vladimir Sementsov-Ogievskiy 99136607b1 block/stream: use BDRV_REQ_PREFETCH
This helps to avoid extra io, allocations and memory copying.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190725100550.33801-3-vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[eblake: fix comment grammar]
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-08-15 13:22:13 -05:00
Vladimir Sementsov-Ogievskiy 3299e5ecf7 block: implement BDRV_REQ_PREFETCH
Do effective copy-on-read request when we don't need data actually. It
will be used for block-stream and NBD_CMD_CACHE.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190725100550.33801-2-vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[eblake: comment grammar fix]
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-08-15 13:22:13 -05:00
Vladimir Sementsov-Ogievskiy 110571be4e block/backup: disable copy_range for compressed backup
Enabled by default copy_range ignores compress option. It's definitely
unexpected for user.

It's broken since introduction of copy_range usage in backup in
9ded4a0114.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190730163251.755248-3-vsementsov@virtuozzo.com
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-08-06 13:17:27 +02:00
Max Reitz 9adc1cb49a mirror: Only mirror granularity-aligned chunks
In write-blocking mode, all writes to the top node directly go to the
target.  We must only mirror chunks of data that are aligned to the
job's granularity, because that is how the dirty bitmap works.
Therefore, the request alignment for writes must be the job's
granularity (in write-blocking mode).

Unfortunately, this forces all reads and writes to have the same
granularity (we only need this alignment for writes to the target, not
the source), but that is something to be fixed another time.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190805153308.2657-1-mreitz@redhat.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Fixes: d06107ade0
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-08-06 13:17:25 +02:00
Max Reitz 4a5b91ca02 backup: Copy only dirty areas
The backup job must only copy areas that the copy_bitmap reports as
dirty.  This is always the case when using traditional non-offloading
backup, because it copies each cluster separately.  When offloading the
copy operation, we sometimes copy more than one cluster at a time, but
we only check whether the first one is dirty.

Therefore, whenever copy offloading is possible, the backup job
currently produces wrong output when the guest writes to an area of
which an inner part has already been backed up, because that inner part
will be re-copied.

Fixes: 9ded4a0114
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190801173900.23851-2-mreitz@redhat.com
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-08-06 13:17:01 +02:00
Max Reitz 1120407bdf nvme: Limit blkshift to 12 (for 4 kB blocks)
Linux does not support blocks greater than 4 kB anyway, so we might as
well limit blkshift to 12 and thus save us from some potential trouble.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190730114812.10493-1-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Coverity: CID 1403771
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-30 14:49:24 +02:00
Kevin Wolf 2b23f28639 block/copy-on-read: Fix permissions for inactive node
The copy-on-read drive must not request the WRITE_UNCHANGED permission
for its child if the node is inactive, otherwise starting a migration
destination with -incoming will fail because the child cannot provide
write access yet:

  qemu-system-x86_64: -blockdev copy-on-read,file=img,node-name=cor: Block node is read-only

Earlier QEMU versions additionally ran into an abort() on the migration
source side: bdrv_inactivate_recurse() failed to update permissions.
This is silently ignored today because it was only supposed to loosen
restrictions. This is the symptom that was originally reported here:

  https://bugzilla.redhat.com/show_bug.cgi?id=1733022

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2019-07-30 12:25:43 +02:00
Max Reitz 65181d6381 block: Dec. drained_end_counter before bdrv_wakeup
Decrementing drained_end_counter after bdrv_dec_in_flight() (which in
turn invokes bdrv_wakeup() and thus aio_wait_kick()) is not very clever.
We should decrement it beforehand, so that any waiting aio_poll() that
is woken by bdrv_dec_in_flight() sees the decremented
drained_end_counter.

Because the time window between decrementing drained_end_counter and
aio_wait_kick() is very small, I cannot supply a reliable regression
test.  However, running e.g. the /bdrv-drain/blockjob/iothread/drain_all
test in test-bdrv-drain has a small chance of hanging without this
patch (about 1/200 or so; it gets to nearly 100 % if you add e.g. an
fputc(' ', stderr); after the bdrv_dec_in_flight()).

Fixes: e037c09c78
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190722133054.21781-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-22 18:41:35 +02:00
Maxim Levitsky 258867d1dc block/nvme: don't touch the completion entries
Completion entries are meant to be only read by the host and written by the device.
The driver is supposed to scan the completions from the last point where it left,
and until it sees a completion with non flipped phase bit.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190716163020.13383-4-mlevitsk@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-22 18:40:32 +02:00
Maxim Levitsky 118d1b6a81 block/nvme: support larger that 512 bytes sector devices
Currently the driver hardcodes the sector size to 512,
and doesn't check the underlying device. Fix that.

Also fail if underlying nvme device is formatted with metadata
as this needs special support.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-id: 20190716163020.13383-3-mlevitsk@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-22 18:40:32 +02:00
Maxim Levitsky 461bba04bf block/nvme: fix doorbell stride
Fix the math involving non standard doorbell stride

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190716163020.13383-2-mlevitsk@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-22 18:40:32 +02:00
Peter Maydell b9e02bb3f9 nbd patches for 2019-07-19
- silence harmless compiler/valgrind warning
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJdMiVIAAoJEKeha0olJ0NqhSkH/RF4EViyGi/BW9rHHmZIKQjL
 h71g+Y5l0QvTHW2qkU9UCjYC7lrhTnD8r06v2qQZCk37Rb0z/y8BGpcZQVE92nPt
 GW5VbqWh4IdNCMUNXlfFo/U2t+hJL+BAAd8OkolKcDU4FUuN9QGkvQztBLb6Edzk
 oz4/NbnFsd87TRI61EjOEspTpTO6cukLRmE0HIsL8KOaYo3E7QhgxRvR45Y58sbN
 TvUTpI0teHzo4MiXD/yRH8oDz7zKttFwAj3E0oU9IcVTyBJXpE8lNA4J2a65KdbH
 S+43tlNDJA1a5+OqyfzFHcgz6nLCcntU4+5LH7YfTy753EurDVqsi7vTo4lJqHM=
 =uPfp
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2019-07-19' into staging

nbd patches for 2019-07-19

- silence harmless compiler/valgrind warning

# gpg: Signature made Fri 19 Jul 2019 21:17:12 BST
# gpg:                using RSA key A7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg:                 aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2019-07-19:
  nbd: Initialize reply on failure

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-07-22 10:11:28 +01:00
Eric Blake 5cf42b1c1f nbd: Initialize reply on failure
We've had two separate reports of different callers running into use
of uninitialized data if s->quit is set (one detected by gcc -O3,
another by valgrind), due to checking 'nbd_reply_is_simple(reply) ||
s->quit' in the wrong order. Rather than chasing down which callers
need to pre-initialize reply, and whether there are any other
uninitialized uses, it's easier to guarantee that reply will always be
set by nbd_co_receive_one_chunk() even on failure.

The uninitialized use happens to be harmless (the only time the
variable is uninitialized is if s->quit is set, so the conditional
results in the same action regardless of what was read from reply),
and was introduced in commit 65e01d47.

In fixing the problem, it can also be seen that all (one) callers pass
in a non-NULL reply, so there is a dead conditional to also be cleaned
up.

Reported-by: Thomas Huth <thuth@redhat.com>
Reported-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190719172001.19770-1-eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-07-19 13:19:18 -05:00
Max Reitz 61ad631cee block: Loop unsafely in bdrv*drained_end()
The graph must not change in these loops (or a QLIST_FOREACH_SAFE would
not even be enough).  We now ensure this by only polling once in the
root bdrv_drained_end() call, so we can drop the _SAFE suffix.  Doing so
makes it clear that the graph must not change.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-19 13:19:17 +02:00
Max Reitz e037c09c78 block: Do not poll in bdrv_do_drained_end()
We should never poll anywhere in bdrv_do_drained_end() (including its
recursive callees like bdrv_drain_invoke()), because it does not cope
well with graph changes.  In fact, it has been written based on the
postulation that no graph changes will happen in it.

Instead, the callers that want to poll must poll, i.e. all currently
globally available wrappers: bdrv_drained_end(),
bdrv_subtree_drained_end(), bdrv_unapply_subtree_drain(), and
bdrv_drain_all_end().  Graph changes there do not matter.

They can poll simply by passing a pointer to a drained_end_counter and
wait until it reaches 0.

This patch also adds a non-polling global wrapper for
bdrv_do_drained_end() that takes a drained_end_counter pointer.  We need
such a variant because now no function called anywhere from
bdrv_do_drained_end() must poll.  This includes
BdrvChildRole.drained_end(), which already must not poll according to
its interface documentation, but bdrv_child_cb_drained_end() just
violates that by invoking bdrv_drained_end() (which does poll).
Therefore, BdrvChildRole.drained_end() must take a *drained_end_counter
parameter, which bdrv_child_cb_drained_end() can pass on to the new
bdrv_drained_end_no_poll() function.

Note that we now have a pattern of all drained_end-related functions
either polling or receiving a *drained_end_counter to let the caller
poll based on that.

A problem with a single poll loop is that when the drained section in
bdrv_set_aio_context_ignore() ends, some nodes in the subgraph may be in
the old contexts, while others are in the new context already.  To let
the collective poll in bdrv_drained_end() work correctly, we must not
hold a lock to the old context, so that the old context can make
progress in case it is different from the current context.

(In the process, remove the comment saying that the current context is
always the old context, because it is wrong.)

In all other places, all nodes in a subtree must be in the same context,
so we can just poll that.  The exception of course is
bdrv_drain_all_end(), but that always runs in the main context, so we
can just poll NULL (like bdrv_drain_all_begin() does).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-19 13:19:16 +02:00
Max Reitz f4c8a43be0 block: Make bdrv_parent_drained_[^_]*() static
These functions are not used outside of block/io.c, there is no reason
why they should be globally available.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-19 13:19:16 +02:00
Max Reitz 8e1da77e6e block: Add @drained_end_counter
Callers can now pass a pointer to an integer that bdrv_drain_invoke()
(and its recursive callees) will increment for every
bdrv_drain_invoke_entry() operation they schedule.
bdrv_drain_invoke_entry() in turn will decrement it once it has invoked
BlockDriver.bdrv_co_drain_end().

We use atomic operations to access the pointee, because the
bdrv_do_drained_end() caller may wish to end drained sections for
multiple nodes in different AioContexts (bdrv_drain_all_end() does, for
example).

This is the first step to moving the polling for BdrvCoDrainData.done to
become true out of bdrv_drain_invoke() and into the root drained_end
function.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-19 13:19:16 +02:00
Max Reitz 804db8ea00 block: Introduce BdrvChild.parent_quiesce_counter
Commit 5cb2737e92 laid out why
bdrv_do_drained_end() must decrement the quiesce_counter after
bdrv_drain_invoke().  It did not give a very good reason why it has to
happen after bdrv_parent_drained_end(), instead only claiming symmetry
to bdrv_do_drained_begin().

It turns out that delaying it for so long is wrong.

Situation: We have an active commit job (i.e. a mirror job) from top to
base for the following graph:

                  filter
                    |
                  [file]
                    |
                    v
top --[backing]--> base

Now the VM is closed, which results in the job being cancelled and a
bdrv_drain_all() happening pretty much simultaneously.

Beginning the drain means the job is paused once whenever one of its
nodes is quiesced.  This is reversed when the drain ends.

With how the code currently is, after base's drain ends (which means
that it will have unpaused the job once), its quiesce_counter remains at
1 while it goes to undrain its parents (bdrv_parent_drained_end()).  For
some reason or another, undraining filter causes the job to be kicked
and enter mirror_exit_common(), where it proceeds to invoke
block_job_remove_all_bdrv().

Now base will be detached from the job.  Because its quiesce_counter is
still 1, it will unpause the job once more.  So in total, undraining
base will unpause the job twice.  Eventually, this will lead to the
job's pause_count going negative -- well, it would, were there not an
assertion against this, which crashes qemu.

The general problem is that if in bdrv_parent_drained_end() we undrain
parent A, and then undrain parent B, which then leads to A detaching the
child, bdrv_replace_child_noperm() will undrain A as if we had not done
so yet; that is, one time too many.

It follows that we cannot decrement the quiesce_counter after invoking
bdrv_parent_drained_end().

Unfortunately, decrementing it before bdrv_parent_drained_end() would be
wrong, too.  Imagine the above situation in reverse: Undraining A leads
to B detaching the child.  If we had already decremented the
quiesce_counter by that point, bdrv_replace_child_noperm() would undrain
B one time too little; because it expects bdrv_parent_drained_end() to
issue this undrain.  But bdrv_parent_drained_end() won't do that,
because B is no longer a parent.

Therefore, we have to do something else.  This patch opts for
introducing a second quiesce_counter that counts how many times a
child's parent has been quiesced (though c->role->drained_*).  With
that, bdrv_replace_child_noperm() just has to undrain the parent exactly
that many times when removing a child, and it will always be right.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-19 13:19:16 +02:00
Peter Maydell 697f59243f * VFIO bugfix for AMD SEV (Alex)
* Kconfig improvements (Julio, Philippe)
 * MemoryRegion reference counting bugfix (King Wang)
 * Build system cleanups (Marc-André, myself)
 * rdmacm-mux off-by-one (Marc-André)
 * ZBC passthrough fixes (Shinichiro, myself)
 * WHPX build fix (Stefan)
 * char-pty fix (Wei Yang)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJdLX1PAAoJEL/70l94x66DQ1YH/im8BbRRRPsm3Qg55fTolcWN
 0+dm/Vfv2P7nfxDMsZ4S+jrvCaCWOZb6ua75TdB74VIXpJTGPU7a3JxyTzRueP+2
 c4WH3owT8x9e4iyLNGZoIDAKtJXLSX6FInjHKTkupLVbs2UpAh0Mipq4zIoIambl
 wf83jFmJ6KCemayE9gfw8Z45YTJcLceIOLaEyXgqrPoHXTmerEj5ZMMIqEMag3W/
 dKszhVjRb6En5Ldn0jEqeC5fU10tKIs+y7VNwdJ8CZw41daBDiXDVmXemJyTF/Xn
 SYJCwrJUSdVU42AE2xXCpBfANCh7eGyg4loCitLv8Z393tN7bRufULsnM/rEreI=
 =tEVO
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* VFIO bugfix for AMD SEV (Alex)
* Kconfig improvements (Julio, Philippe)
* MemoryRegion reference counting bugfix (King Wang)
* Build system cleanups (Marc-André, myself)
* rdmacm-mux off-by-one (Marc-André)
* ZBC passthrough fixes (Shinichiro, myself)
* WHPX build fix (Stefan)
* char-pty fix (Wei Yang)

# gpg: Signature made Tue 16 Jul 2019 08:31:27 BST
# gpg:                using RSA key BFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  vl: make sure char-pty message displayed by moving setbuf to the beginning
  create_config: remove $(CONFIG_SOFTMMU) hack
  Makefile: do not repeat $(CONFIG_SOFTMMU) in hw/Makefile.objs
  hw/usb/Kconfig: USB_XHCI_NEC requires USB_XHCI
  hw/usb/Kconfig: Add CONFIG_USB_EHCI_PCI
  target/i386: sev: Do not unpin ram device memory region
  checkpatch: detect doubly-encoded UTF-8
  hw/lm32/Kconfig: Milkymist One provides a USB 1.1 Controller
  util: merge main-loop.c and iohandler.c
  Fix broken build with WHPX enabled
  memory: unref the memory region in simplify flatview
  hw/i386: turn off vmport if CONFIG_VMPORT is disabled
  rdmacm-mux: fix strcpy string warning
  build-sys: remove slirp cflags from main-loop.o
  iscsi: base all handling of check condition on scsi_sense_to_errno
  iscsi: fix busy/timeout/task set full
  scsi: add guest-recoverable ZBC errors
  scsi: explicitly list guest-recoverable sense codes
  scsi-disk: pass sense correctly for guest-recoverable errors

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-07-16 15:08:29 +01:00
Stefano Garzarella 0b1847bbc2 gluster: fix .bdrv_reopen_prepare when backing file is a JSON object
When the backing_file is specified as a JSON object, the
qemu_gluster_reopen_prepare() fails with this message:
    invalid URI json:{"server.0.host": ...}

In this case, we should call qemu_gluster_init() using the QDict
'state->options' that contains the JSON parameters already parsed.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1542445
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20190715132844.506584-1-sgarzare@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-15 15:48:41 +02:00
Max Reitz 8441d82d51 block/stream: Swap backing file change order
bdrv_change_backing_file() can result in yields.  Therefore, @base may
no longer be the the backing_bs() of s->bottom afterwards.

Just swap the order of the two calls to fix this.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190703172813.6868-4-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-15 15:48:40 +02:00
Max Reitz 17a7c39248 block/stream: Fix error path
As of commit c624b015bf, the stream job
only freezes the chain until the overlay of the base node.  The error
path must consider this.

Fixes: c624b015bf
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190703172813.6868-3-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-15 15:48:40 +02:00
Max Reitz e5182c1c57 block: Add BDS.never_freeze
The commit and the mirror block job must be able to drop their filter
node at any point.  However, this will not be possible if any of the
BdrvChild links to them is frozen.  Therefore, we need to prevent them
from ever becoming frozen.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190703172813.6868-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-15 15:48:40 +02:00
Michal Privoznik 95667c3be0 nvme: Set number of queues later in nvme_init()
When creating the admin queue in nvme_init() the variable that
holds the number of queues created is modified before actual
queue creation. This is a problem because if creating the queue
fails then the variable is left in inconsistent state. This was
actually observed when I tried to hotplug a nvme disk. The
control got to nvme_file_open() which called nvme_init() which
failed and thus nvme_close() was called which in turn called
nvme_free_queue_pair() with queue being NULL. This lead to an
instant crash:

  #0  0x000055d9507ec211 in nvme_free_queue_pair (bs=0x55d952ddb880, q=0x0) at block/nvme.c:164
  #1  0x000055d9507ee180 in nvme_close (bs=0x55d952ddb880) at block/nvme.c:729
  #2  0x000055d9507ee3d5 in nvme_file_open (bs=0x55d952ddb880, options=0x55d952bb1410, flags=147456, errp=0x7ffd8e19e200) at block/nvme.c:781
  #3  0x000055d9507629f3 in bdrv_open_driver (bs=0x55d952ddb880, drv=0x55d95109c1e0 <bdrv_nvme>, node_name=0x0, options=0x55d952bb1410, open_flags=147456, errp=0x7ffd8e19e310) at block.c:1291
  #4  0x000055d9507633d6 in bdrv_open_common (bs=0x55d952ddb880, file=0x0, options=0x55d952bb1410, errp=0x7ffd8e19e310) at block.c:1551
  #5  0x000055d950766881 in bdrv_open_inherit (filename=0x0, reference=0x0, options=0x55d952bb1410, flags=32768, parent=0x55d9538ce420, child_role=0x55d950eaade0 <child_file>, errp=0x7ffd8e19e510) at block.c:3063
  #6  0x000055d950765ae4 in bdrv_open_child_bs (filename=0x0, options=0x55d9541cdff0, bdref_key=0x55d950af33aa "file", parent=0x55d9538ce420, child_role=0x55d950eaade0 <child_file>, allow_none=true, errp=0x7ffd8e19e510) at block.c:2712
  #7  0x000055d950766633 in bdrv_open_inherit (filename=0x0, reference=0x0, options=0x55d9541cdff0, flags=0, parent=0x0, child_role=0x0, errp=0x7ffd8e19e908) at block.c:3011
  #8  0x000055d950766dba in bdrv_open (filename=0x0, reference=0x0, options=0x55d953d00390, flags=0, errp=0x7ffd8e19e908) at block.c:3156
  #9  0x000055d9507cb635 in blk_new_open (filename=0x0, reference=0x0, options=0x55d953d00390, flags=0, errp=0x7ffd8e19e908) at block/block-backend.c:389
  #10 0x000055d950465ec5 in blockdev_init (file=0x0, bs_opts=0x55d953d00390, errp=0x7ffd8e19e908) at blockdev.c:602

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Message-id: 927aae40b617ba7d4b6c7ffe74e6d7a2595f8e86.1562770546.git.mprivozn@redhat.com
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-15 15:48:40 +02:00
Paolo Bonzini 8c460269aa iscsi: base all handling of check condition on scsi_sense_to_errno
Now that scsi-disk is not using scsi_sense_to_errno to separate guest-recoverable
sense codes, we can modify it to simplify iscsi's own sense handling.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-15 11:20:42 +02:00
Paolo Bonzini 00e3cccdf4 iscsi: fix busy/timeout/task set full
In this case, do_retry was set without calling aio_co_wake, thus never
waking up the coroutine.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-15 11:20:42 +02:00
Maxim Levitsky 867eccfed8 file-posix: Use max transfer length/segment count only for SCSI passthrough
Regular kernel block devices (/dev/sda*, /dev/nvme*, etc) don't have
max segment size/max segment count hardware requirements exposed
to the userspace, but rather the kernel block layer
takes care to split the incoming requests that
violate these requirements.

Allowing the kernel to do the splitting allows qemu to avoid
various overheads that arise otherwise from this.

This is especially visible in nbd server,
exposing as a raw file, a mostly empty qcow2 image over the net.
In this case most of the reads by the remote user
won't even hit the underlying kernel block device,
and therefore most of the  overhead will be in the
nbd traffic which increases significantly with lower max transfer size.

In addition to that even for local block device
access the peformance improves a bit due to less
traffic between qemu and the kernel when large
transfer sizes are used (e.g for image conversion)

More info can be found at:
https://bugzilla.redhat.com/show_bug.cgi?id=1647104

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-12 15:42:23 +02:00
Eric Blake f7077c9860 qcow2: Allow -o compat=v3 during qemu-img amend
Commit b76b4f60 allowed '-o compat=v3' as an alias for the
less-appealing '-o compat=1.1' for 'qemu-img create' since we want to
use the QMP form as much as possible, but forgot to do likewise for
qemu-img amend.  Also, it doesn't help that '-o help' doesn't list our
new preferred spellings.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-08 16:00:31 +02:00
John Snow 197bfa7da7 block/qcow: Improve error when opening qcow2 files as qcow
Reported-by: radmehrsaeed7@gmail.com
Fixes: https://bugs.launchpad.net/bugs/1832914
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-08 16:00:26 +02:00
Andrey Shinkevich c624b015bf block/stream: introduce a bottom node
The bottom node is the intermediate block device that has the base as its
backing image. It is used instead of the base node while a block stream
job is running to avoid dependency on the base that may change due to the
parallel jobs. The change may take place due to a filter node as well that
is inserted between the base and the intermediate bottom node. It occurs
when the base node is the top one for another commit or stream job.
After the introduction of the bottom node, don't freeze its backing child,
that's the base, anymore.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 1559152576-281803-4-git-send-email-andrey.shinkevich@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-02 03:53:05 +02:00
Andrey Shinkevich 96a07d5bf4 block/stream: refactor stream_run: drop goto
The goto is unnecessary in the stream_run() since the common exit
code was removed in the commit eb23654dbe43b549ea2a9ebff9d8e:
"jobs: utilize job_exit shim".

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1559152576-281803-3-git-send-email-andrey.shinkevich@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-02 03:53:04 +02:00
Andrey Shinkevich 170d3bd341 block: include base when checking image chain for block allocation
This patch is used in the 'block/stream: introduce a bottom node'
that is following. Instead of the base node, the caller may pass
the node that has the base as its backing image to the function
bdrv_is_allocated_above() with a new parameter include_base = true
and get rid of the dependency on the base that may change during
commit/stream parallel jobs. Now, if the specified base is not
found in the backing image chain, the QEMU will abort.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 1559152576-281803-2-git-send-email-andrey.shinkevich@virtuozzo.com
[mreitz: Squashed in the following as a rebase on conflicting patches:]
Message-id: e3cf99ae-62e9-8b6e-5a06-d3c8b9363b85@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-02 03:53:04 +02:00
Stefano Garzarella d24f80234b block/rbd: increase dynamically the image size
RBD APIs don't allow us to write more than the size set with
rbd_create() or rbd_resize().
In order to support growing images (eg. qcow2), we resize the
image before write operations that exceed the current size.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20190509145927.293369-1-sgarzare@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-02 03:53:04 +02:00
Pino Toscano b10d49d761 ssh: switch from libssh2 to libssh
Rewrite the implementation of the ssh block driver to use libssh instead
of libssh2.  The libssh library has various advantages over libssh2:
- easier API for authentication (for example for using ssh-agent)
- easier API for known_hosts handling
- supports newer types of keys in known_hosts

Use APIs/features available in libssh 0.8 conditionally, to support
older versions (which are not recommended though).

Adjust the iotest 207 according to the different error message, and to
find the default key type for localhost (to properly compare the
fingerprint with).
Contributed-by: Max Reitz <mreitz@redhat.com>

Adjust the various Docker/Travis scripts to use libssh when available
instead of libssh2. The mingw/mxe testing is dropped for now, as there
are no packages for it.

Signed-off-by: Pino Toscano <ptoscano@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20190620200840.17655-1-ptoscano@redhat.com
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 5873173.t2JhDm7DL7@lindworm.usersys.redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-06-24 16:01:04 +02:00
Sam Eiderman 98eb9733f4 vmdk: Add read-only support for seSparse snapshots
Until ESXi 6.5 VMware used the vmfsSparse format for snapshots (VMDK3 in
QEMU).

This format was lacking in the following:

    * Grain directory (L1) and grain table (L2) entries were 32-bit,
      allowing access to only 2TB (slightly less) of data.
    * The grain size (default) was 512 bytes - leading to data
      fragmentation and many grain tables.
    * For space reclamation purposes, it was necessary to find all the
      grains which are not pointed to by any grain table - so a reverse
      mapping of "offset of grain in vmdk" to "grain table" must be
      constructed - which takes large amounts of CPU/RAM.

The format specification can be found in VMware's documentation:
https://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf

In ESXi 6.5, to support snapshot files larger than 2TB, a new format was
introduced: SESparse (Space Efficient).

This format fixes the above issues:

    * All entries are now 64-bit.
    * The grain size (default) is 4KB.
    * Grain directory and grain tables are now located at the beginning
      of the file.
      + seSparse format reserves space for all grain tables.
      + Grain tables can be addressed using an index.
      + Grains are located in the end of the file and can also be
        addressed with an index.
      - seSparse vmdks of large disks (64TB) have huge preallocated
        headers - mainly due to L2 tables, even for empty snapshots.
    * The header contains a reverse mapping ("backmap") of "offset of
      grain in vmdk" to "grain table" and a bitmap ("free bitmap") which
      specifies for each grain - whether it is allocated or not.
      Using these data structures we can implement space reclamation
      efficiently.
    * Due to the fact that the header now maintains two mappings:
        * The regular one (grain directory & grain tables)
        * A reverse one (backmap and free bitmap)
      These data structures can lose consistency upon crash and result
      in a corrupted VMDK.
      Therefore, a journal is also added to the VMDK and is replayed
      when the VMware reopens the file after a crash.

Since ESXi 6.7 - SESparse is the only snapshot format available.

Unfortunately, VMware does not provide documentation regarding the new
seSparse format.

This commit is based on black-box research of the seSparse format.
Various in-guest block operations and their effect on the snapshot file
were tested.

The only VMware provided source of information (regarding the underlying
implementation) was a log file on the ESXi:

    /var/log/hostd.log

Whenever an seSparse snapshot is created - the log is being populated
with seSparse records.

Relevant log records are of the form:

[...] Const Header:
[...]  constMagic     = 0xcafebabe
[...]  version        = 2.1
[...]  capacity       = 204800
[...]  grainSize      = 8
[...]  grainTableSize = 64
[...]  flags          = 0
[...] Extents:
[...]  Header         : <1 : 1>
[...]  JournalHdr     : <2 : 2>
[...]  Journal        : <2048 : 2048>
[...]  GrainDirectory : <4096 : 2048>
[...]  GrainTables    : <6144 : 2048>
[...]  FreeBitmap     : <8192 : 2048>
[...]  BackMap        : <10240 : 2048>
[...]  Grain          : <12288 : 204800>
[...] Volatile Header:
[...] volatileMagic     = 0xcafecafe
[...] FreeGTNumber      = 0
[...] nextTxnSeqNumber  = 0
[...] replayJournal     = 0

The sizes that are seen in the log file are in sectors.
Extents are of the following format: <offset : size>

This commit is a strict implementation which enforces:
    * magics
    * version number 2.1
    * grain size of 8 sectors  (4KB)
    * grain table size of 64 sectors
    * zero flags
    * extent locations

Additionally, this commit proivdes only a subset of the functionality
offered by seSparse's format:
    * Read-only
    * No journal replay
    * No space reclamation
    * No unmap support

Hence, journal header, journal, free bitmap and backmap extents are
unused, only the "classic" (L1 -> L2 -> data) grain access is
implemented.

However there are several differences in the grain access itself.
Grain directory (L1):
    * Grain directory entries are indexes (not offsets) to grain
      tables.
    * Valid grain directory entries have their highest nibble set to
      0x1.
    * Since grain tables are always located in the beginning of the
      file - the index can fit into 32 bits - so we can use its low
      part if it's valid.
Grain table (L2):
    * Grain table entries are indexes (not offsets) to grains.
    * If the highest nibble of the entry is:
        0x0:
            The grain in not allocated.
            The rest of the bytes are 0.
        0x1:
            The grain is unmapped - guest sees a zero grain.
            The rest of the bits point to the previously mapped grain,
            see 0x3 case.
        0x2:
            The grain is zero.
        0x3:
            The grain is allocated - to get the index calculate:
            ((entry & 0x0fff000000000000) >> 48) |
            ((entry & 0x0000ffffffffffff) << 12)
    * The difference between 0x1 and 0x2 is that 0x1 is an unallocated
      grain which results from the guest using sg_unmap to unmap the
      grain - but the grain itself still exists in the grain extent - a
      space reclamation procedure should delete it.
      Unmapping a zero grain has no effect (0x2 will not change to 0x1)
      but unmapping an unallocated grain will (0x0 to 0x1) - naturally.

In order to implement seSparse some fields had to be changed to support
both 32-bit and 64-bit entry sizes.

Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
Message-id: 20190620091057.47441-4-shmuel.eiderman@oracle.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-06-24 15:53:02 +02:00
Sam Eiderman 59d6ee4850 vmdk: Reduce the max bound for L1 table size
512M of L1 entries is a very loose bound, only 32M are required to store
the maximal supported VMDK file size of 2TB.

Fixed qemu-iotest 59# - now failure occures before on impossible L1
table size.

Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
Message-id: 20190620091057.47441-3-shmuel.eiderman@oracle.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-06-24 15:53:02 +02:00
Sam Eiderman 940a2cd5d2 vmdk: Fix comment regarding max l1_size coverage
Commit b0651b8c24 ("vmdk: Move l1_size check into vmdk_add_extent")
extended the l1_size check from VMDK4 to VMDK3 but did not update the
default coverage in the moved comment.

The previous vmdk4 calculation:

    (512 * 1024 * 1024) * 512(l2 entries) * 65536(grain) = 16PB

The added vmdk3 calculation:

    (512 * 1024 * 1024) * 4096(l2 entries) * 512(grain) = 1PB

Adding the calculation of vmdk3 to the comment.

In any case, VMware does not offer virtual disks more than 2TB for
vmdk4/vmdk3 or 64TB for the new undocumented seSparse format which is
not implemented yet in qemu.

Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
Message-id: 20190620091057.47441-2-shmuel.eiderman@oracle.com
Reviewed-by: yuchenlin <yuchenlin@synology.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-06-24 15:53:02 +02:00
Max Reitz a193ad3b3b block/commit: Drop bdrv_child_try_set_perm()
commit_top_bs never requests or unshares any permissions.  There is no
reason to make this so explicit here.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18 16:41:10 +02:00
Max Reitz f94dc3b414 block/mirror: Fix child permissions
We cannot use bdrv_child_try_set_perm() to give up all restrictions on
the child edge, and still have bdrv_mirror_top_child_perm() request
BLK_PERM_WRITE.  Fix this by making bdrv_mirror_top_child_perm() return
0/BLK_PERM_ALL when we want to give up all permissions, and replacing
bdrv_child_try_set_perm() by bdrv_child_refresh_perms().

The bdrv_child_try_set_perm() before removing the node with
bdrv_replace_node() is then unnecessary.  No permissions have changed
since the previous invocation of bdrv_child_try_set_perm().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18 16:41:10 +02:00
Max Reitz 094e363944 file-posix: Update open_flags in raw_set_perm()
raw_check_perm() + raw_set_perm() can change the flags associated with
the current FD.  If so, we have to update BDRVRawState.open_flags
accordingly.  Otherwise, we may keep reopening the FD even though the
current one already has the correct flags.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18 16:41:10 +02:00
Vladimir Sementsov-Ogievskiy b23c580c94 block: drop bs->job
Drop remaining users of bs->job:
1. assertions actually duplicated by assert(!bs->refcnt)
2. trace-point seems not enough reason to change stream_start to return
   BlockJob pointer
3. Restricting creation of two jobs based on same bs is bad idea, as
   3.1 Some jobs creates filters to be their main node, so, this check
   don't actually prevent creating second job on same real node (which
   will create another filter node) (but I hope it is restricted by
   other mechanisms)
   3.2 Even without bs->job we have two systems of permissions:
   op-blockers and BLK_PERM
   3.3 We may want to run several jobs on one node one day

And finally, drop bs->job pointer itself. Hurrah!

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18 16:41:10 +02:00
Vladimir Sementsov-Ogievskiy 68d00e4293 block/block-backend: blk_iostatus_reset: drop usage of bs->job
We are going to remove bs->job pointer. Drop it's usage in
blk_iostatus_reset.

blk_iostatus_reset() has only two callers:

1. blk_attach_dev(). This doesn't have anything to do with jobs and
    attaching a new guest device won't solve any problem the job
    encountered, so no reason to reset the iostatus for the job.

2. qmp_cont(). This resets the iostatus for everything. We can just
    call block_job_iostatus_reset() for all block jobs instead of going
    through BlockBackend.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18 16:41:10 +02:00
Vladimir Sementsov-Ogievskiy cc19f1773d block/replication: drop usage of bs->job
We are going to remove bs->job pointer. Drop it's usage in replication
code. Additionally we have to return job pointer from some mirror APIs.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18 16:41:09 +02:00
Max Reitz 1adb0b5e0f blkdebug: Inject errors on .bdrv_co_block_status()
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190507203508.18026-6-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-06-14 14:16:57 +02:00
Max Reitz f8cec157cb blkdebug: Add "none" event
Together with @iotypes and @sector, this can be used to trap e.g. the
first read or write access to a certain sector without having to know
what happens internally in the block layer, i.e. which "real" events
happen right before such an access.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190507203508.18026-5-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-06-14 14:16:57 +02:00
Max Reitz 16789db3de blkdebug: Add @iotype error option
This new error option allows users of blkdebug to inject errors only on
certain kinds of I/O operations.  Users usually want to make a very
specific operation fail, not just any; but right now they simply hope
that the event that triggers the error injection is followed up with
that very operation.  That may not be true, however, because the block
layer is changing (including blkdebug, which may increase the number of
types of I/O operations on which to inject errors).

The new option's default has been chosen to keep backwards
compatibility.

Note that similar to the internal representation, we could choose to
expose this option as a list of I/O types.  But there is no practical
use for this, because as described above, users usually know exactly
which kind of operation they want to make fail, so there is no need to
specify multiple I/O types at once.  In addition, exposing this option
as a list would require non-trivial changes to qemu_opts_absorb_qdict().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190507203508.18026-4-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-06-14 14:16:57 +02:00
Vladimir Sementsov-Ogievskiy 611ae1d716 block/nbd: merge NBDClientSession struct back to BDRVNBDState
No reason to keep it separate, it differs from others block driver
behavior and therefore confuses. Instead of generic
  'state = (State*)bs->opaque' we have to use special helper.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190611102720.86114-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-06-13 10:00:42 -05:00
Vladimir Sementsov-Ogievskiy 86f8cdf3db block/nbd: merge nbd-client.* to nbd.c
No reason for keeping driver handlers realization separate from driver
structure. We can get rid of extra header file.

While being here, fix comments style, restore forgotten comments for
NBD_FOREACH_REPLY_CHUNK and nbd_reply_chunk_iter_receive, remove extra
includes.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190611102720.86114-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-06-13 09:55:09 -05:00
Vladimir Sementsov-Ogievskiy 0a93b359db block/nbd-client: drop stale logout
Drop one on failure path (we have errp) and turn two others into trace
points.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190611102720.86114-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-06-13 09:35:53 -05:00
Stefano Garzarella 2ea8e96da2 block/gluster: update .help of BLOCK_OPT_PREALLOC option
Add missing 'falloc' among the allowed values of 'preallocation'
option; show it and 'full' only when they are supported.
('falloc' is supported if defined CONFIG_GLUSTERFS_FALLOCATE,
'full' is supported if defined CONFIG_GLUSTERFS_ZEROFILL)

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190524075848.23781-4-sgarzare@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2019-06-12 18:32:32 +02:00
Stefano Garzarella abea00533f block/file-posix: update .help of BLOCK_OPT_PREALLOC option
Show 'falloc' among the allowed values of 'preallocation'
option, only when it is supported (if defined CONFIG_POSIX_FALLOCATE)

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190524075848.23781-3-sgarzare@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2019-06-12 18:31:46 +02:00
Markus Armbruster a8d2532645 Include qemu-common.h exactly where needed
No header includes qemu-common.h after this commit, as prescribed by
qemu-common.h's file comment.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-5-armbru@redhat.com>
[Rebased with conflicts resolved automatically, except for
include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c
block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c
target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h
target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h
target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h
target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and
net/tap-bsd.c fixed up]
2019-06-12 13:20:20 +02:00
Markus Armbruster 0b8fa32f55 Include qemu/module.h where needed, drop it from qemu-common.h
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-4-armbru@redhat.com>
[Rebased with conflicts resolved automatically, except for
hw/usb/dev-hub.c hw/misc/exynos4210_rng.c hw/misc/bcm2835_rng.c
hw/misc/aspeed_scu.c hw/display/virtio-vga.c hw/arm/stm32f205_soc.c;
ui/cocoa.m fixed up]
2019-06-12 13:18:33 +02:00
Markus Armbruster 856dfd8a03 qemu-common: Move qemu_isalnum() etc. to qemu/ctype.h
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-3-armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-11 20:22:09 +02:00
Vladimir Sementsov-Ogievskiy d93e572688 block/io: bdrv_pdiscard: support int64_t bytes parameter
This fixes at least one overflow in qcow2_process_discards, which
passes 64bit region length to bdrv_pdiscard where bytes (or sectors in
the past) parameter is int since its introduction in 0b919fae.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 16:55:58 +02:00
Vladimir Sementsov-Ogievskiy 1477b6c803 block/qcow2-refcount: add trace-point to qcow2_process_discards
Let's at least trace ignored failure.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 16:55:58 +02:00
Kevin Wolf d0ee0204f4 block: Remove wrong bdrv_set_aio_context() calls
The mirror and commit block jobs use bdrv_set_aio_context() to move
their filter node into the right AioContext before hooking it up in the
graph. Similarly, bdrv_open_backing_file() explicitly moves the backing
file node into the right AioContext first.

This isn't necessary any more, they get automatically moved into the
right context now when attaching them.

However, in the case of bdrv_open_backing_file() with a node reference,
it's actually not only unnecessary, but even wrong: The unchecked
bdrv_set_aio_context() changes the AioContext of the child node even if
other parents require it to retain the old context. So this is not only
a simplification, but a bug fix, too.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1684342
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 15:22:22 +02:00
Kevin Wolf 132ada80c4 block: Adjust AioContexts when attaching nodes
So far, we only made sure that updating the AioContext of a node
affected the whole subtree. However, if a node is newly attached to a
new parent, we also need to make sure that both the subtree of the node
and the parent are in the same AioContext. This tries to move the new
child node to the parent AioContext and returns an error if this isn't
possible.

BlockBackends now actually apply their AioContext to their root node.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 15:22:22 +02:00
Kevin Wolf d861ab3acf block: Add BlockBackend.ctx
This adds a new parameter to blk_new() which requires its callers to
declare from which AioContext this BlockBackend is going to be used (or
the locks of which AioContext need to be taken anyway).

The given context is only stored and kept up to date when changing
AioContexts. Actually applying the stored AioContext to the root node
is saved for another commit.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 15:22:22 +02:00
Kevin Wolf 97896a4887 block: Add Error to blk_set_aio_context()
Add an Error parameter to blk_set_aio_context() and use
bdrv_child_try_set_aio_context() internally to check whether all
involved nodes can actually support the AioContext switch.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 15:22:22 +02:00
Julia Suvorova 2b02fd81de block/linux-aio: Drop unused BlockAIOCB submission method
Callback-based laio_submit() and laio_cancel() were left after
rewriting Linux AIO backend to coroutines in hope that they would be
used in other code that could bypass coroutines. They can be safely
removed because they have not been used since that time.

Signed-off-by: Julia Suvorova <jusual@mail.ru>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 15:20:41 +02:00
Max Reitz 5cb2737e92 block/io: Delay decrementing the quiesce_counter
When ending a drained section, bdrv_do_drained_end() currently first
decrements the quiesce_counter, and only then actually ends the drain.

The bdrv_drain_invoke(bs, false) call may cause graph changes.  Say the
graph change involves replacing an existing BB's ("blk") BDS
(blk_bs(blk)) by @bs.  Let us introducing the following values:
- bs_oqc = old_quiesce_counter
  (so bs->quiesce_counter == bs_oqc - 1)
- obs_qc = blk_bs(blk)->quiesce_counter (before bdrv_drain_invoke())

Let us assume there is no blk_pread_unthrottled() involved, so
blk->quiesce_counter == obs_qc (before bdrv_drain_invoke()).

Now replacing blk_bs(blk) by @bs will reduce blk->quiesce_counter by
obs_qc (making it 0) and increase it by bs_oqc-1 (making it bs_oqc-1).

bdrv_drain_invoke() returns and we invoke bdrv_parent_drained_end().
This will decrement blk->quiesce_counter by one, so it would be -1 --
were there not an assertion against that in blk_root_drained_end().

We therefore have to keep the quiesce_counter up at least until
bdrv_drain_invoke() returns, so that bdrv_parent_drained_end() does the
right thing for the parents @bs got during bdrv_drain_invoke().

But let us delay it even further, namely until bdrv_parent_drained_end()
returns, because then it mirrors bdrv_do_drained_begin(): There, we
first increment the quiesce_counter, then begin draining the parents,
and then call bdrv_drain_invoke().  It makes sense to let
bdrv_do_drained_end() unravel this exactly in reverse.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 15:20:41 +02:00
Vladimir Sementsov-Ogievskiy 69f47505ee block: avoid recursive block_status call if possible
drv_co_block_status digs bs->file for additional, more accurate search
for hole inside region, reported as DATA by bs since 5daa74a6eb.

This accuracy is not free: assume we have qcow2 disk. Actually, qcow2
knows, where are holes and where is data. But every block_status
request calls lseek additionally. Assume a big disk, full of
data, in any iterative copying block job (or img convert) we'll call
lseek(HOLE) on every iteration, and each of these lseeks will have to
iterate through all metadata up to the end of file. It's obviously
ineffective behavior. And for many scenarios we don't need this lseek
at all.

However, lseek is needed when we have metadata-preallocated image.

So, let's detect metadata-preallocation case and don't dig qcow2's
protocol file in other cases.

The idea is to compare allocation size in POV of filesystem with
allocations size in POV of Qcow2 (by refcounts). If allocation in fs is
significantly lower, consider it as metadata-preallocation case.

102 iotest changed, as our detector can't detect shrinked file as
metadata-preallocation, which don't seem to be wrong, as with metadata
preallocation we always have valid file length.

Two other iotests have a slight change in their QMP output sequence:
Active 'block-commit' returns earlier because the job coroutine yields
earlier on a blocking operation. This operation is loading the refcount
blocks in qcow2_detect_metadata_preallocation().

Suggested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 15:20:41 +02:00
Peter Maydell 62f6849e7a Pull request
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+ber27ys35W+dsvQfe+BBqr8OQ4FAlztyykACgkQfe+BBqr8
 OQ4EaA//QQVDpIBRcMN+LeKWeEs8VSLziPUrZuFvuhMEEnjnaU6gbKq8G8xbFQ62
 JIHg0DBGhTt8ymE9Ay6O/cooR8F0z+XyfDr7UlpI7JL/Uwl7JguGKQrWUYBRMqCv
 Q2cLaWStLkfdkuW7Y3WRc16VEnIlizDxjRzfjE2ESYpuzD2fFsBY3KZbgbJwYwZw
 SujWUQ3MdsNdw5kDmerlrDUy7r/eyl2cLXyIt6ClHNoqq392oGMoUn4XbsaLnCWE
 H5s46qm33eXtvBHqxVGoOMAli5FwCnhwF+H3xg93jIG6vC/RXQYCIhlEmEwKyrU2
 g2DWWe/8+9b0iX+zTIcAPTcn1pmjVivGRorOurP0AtMtjV/8PvV+hAQQeSg2ARB3
 rLpXaEphD4WTwu7mYlZ5kX0qvX2SftaMU08k1IgR3mfo8Z3X9znVoFIv8HLlHuy+
 OhCmwT5OWYw4mNABTXeBMH/Dcs9EcU4+T/KhAGLReHo18CSyjeT2xsT+XCsETagF
 KlAP88dP0EdJ9Oiccyb8as22u7ygKWIiDYPplBdb4SkKg/koQnYGDjeDAzB2vXS3
 cGVhGJD2DBbcePA8iaCfWzsSCDOTBFQLa45uhPD3DnkAJylhecSsiDQP+IrLslK3
 h/8v9e8MAlHMgrueSnS7foMDI9rdrTNsChuNCJWOOaUI/ZWnXFg=
 =kCrN
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging

Pull request

# gpg: Signature made Wed 29 May 2019 00:58:33 BST
# gpg:                using RSA key F9B7ABDBBCACDF95BE76CBD07DEF8106AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" [full]
# Primary key fingerprint: FAEB 9711 A12C F475 812F  18F2 88A9 064D 1835 61EB
#      Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76  CBD0 7DEF 8106 AAFC 390E

* remotes/jnsnow/tags/bitmaps-pull-request:
  iotests: test external snapshot with bitmap copying
  qapi: support external bitmaps in block-dirty-bitmap-merge
  migration/dirty-bitmaps: change bitmap enumeration method

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-05-30 12:10:27 +01:00
Vladimir Sementsov-Ogievskiy eff0829b07 qapi: support external bitmaps in block-dirty-bitmap-merge
Add new optional parameter making possible to merge bitmaps from
different nodes. It is needed to maintain external snapshots during
incremental backup chain history.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190517152111.206494-2-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-05-28 19:33:31 -04:00
Andrey Shinkevich 6388903e7c qcow2-bitmap: initialize bitmap directory alignment
Valgrind detects multiple issues in QEMU iotests when the memory is
used without being initialized. Valgrind may dump lots of unnecessary
reports what makes the memory issue analysis harder. Particularly,
that is true for the aligned bitmap directory and can be seen while
running the iotest #169. Padding the aligned space with zeros eases
the pain.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-id: 1558961521-131620-1-git-send-email-andrey.shinkevich@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Anton Nefedov c8bb23cbdb qcow2: skip writing zero buffers to empty COW areas
If COW areas of the newly allocated clusters are zeroes on the backing
image, efficient bdrv_write_zeroes(flags=BDRV_REQ_NO_FALLBACK) can be
used on the whole cluster instead of writing explicit zero buffers later
in perform_cow().

iotest 060:
write to the discarded cluster does not trigger COW anymore.
Use a backing image instead.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Message-id: 20190516142749.81019-2-anton.nefedov@virtuozzo.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Alberto Garcia b441dc71c0 block: Make bdrv_root_attach_child() unref child_bs on failure
A consequence of the previous patch is that bdrv_attach_child()
transfers the reference to child_bs from the caller to parent_bs,
which will drop it on bdrv_close() or when someone calls
bdrv_unref_child().

But this only happens when bdrv_attach_child() succeeds. If it fails
then the caller is responsible for dropping the reference to child_bs.

This patch makes bdrv_attach_child() take the reference also when
there is an error, freeing the caller for having to do it.

A similar situation happens with bdrv_root_attach_child(), so the
changes on this patch affect both functions.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 20dfb3d9ccec559cdd1a9690146abad5d204a186.1557754872.git.berto@igalia.com
[mreitz: Removed now superfluous BdrvChild * variable in
         bdrv_open_child()]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy ae6b12fa4c block/backup: refactor: split out backup_calculate_cluster_size
Split out cluster_size calculation. Move copy-bitmap creation above
block-job creation, as we are going to share it with upcoming
backup-top filter, which also should be created before actual block job
creation.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190429090842.57910-6-vsementsov@virtuozzo.com
[mreitz: Dropped a paragraph from the commit message that was left over
         from a previous version]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy c334e897d0 block/backup: unify different modes code path
Do full, top and incremental mode copying all in one place. This
unifies the code path and helps further improvements.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190429090842.57910-5-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy 9eb5a248f3 block/backup: refactor and tolerate unallocated cluster skipping
Split allocation checking to separate function and reduce nesting.
Consider bdrv_is_allocated() fail as allocated area, as copying more
than needed is not wrong (and we do it anyway) and seems better than
fail the whole job. And, most probably we will fail on the next read,
if there are real problem with source.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190429090842.57910-4-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy a8389e315e block/backup: move to copy_bitmap with granularity
We are going to share this bitmap between backup and backup-top filter
driver, so let's share something more meaningful. It also simplifies
some calculations.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190429090842.57910-3-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy c2da3413c0 block/backup: simplify backup_incremental_init_copy_bitmap
Simplify backup_incremental_init_copy_bitmap using the function
bdrv_dirty_bitmap_next_dirty_area.

Note: move to job->len instead of bitmap size: it should not matter but
less code.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190429090842.57910-2-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy 8ac0f15f33 qcow2: do encryption in threads
Do encryption/decryption in threads, like it is already done for
compression. This improves asynchronous encrypted io.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190506142741.41731-9-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy 5447c3a03f qcow2: bdrv_co_pwritev: move encryption code out of the lock
Encryption will be done in threads, to take benefit of it, we should
move it out of the lock first.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190506142741.41731-8-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy f24196d388 qcow2: qcow2_co_preadv: improve locking
Background: decryption will be done in threads, to take benefit of it,
we should move it out of the lock first.

But let's go further: it turns out, that only
qcow2_get_cluster_offset() needs locking, so reduce locking to it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190506142741.41731-7-vsementsov@virtuozzo.com
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy 6f13a316dd qcow2-threads: split out generic path
Move generic part out of qcow2_co_do_compress, to reuse it for
encryption and rename things that would be shared with encryption path.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190506142741.41731-6-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy 0f5636c51c qcow2-threads: qcow2_co_do_compress: protect queuing by mutex
Drop dependence on AioContext lock.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190506142741.41731-5-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy 269062efc8 qcow2-threads: use thread_pool_submit_co
Use thread_pool_submit_co, instead of reinventing it here. Note, that
thread_pool_submit_aio() never returns NULL, so checking it was an
extra thing.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190506142741.41731-4-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy 56e2f1d898 qcow2: add separate file for threaded data processing functions
Move compression-on-threads to separate file. Encryption will be in it
too.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190506142741.41731-3-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy 9353db47c5 qcow2.h: add missing include
qcow2.h depends on block_int.h. Compilation isn't broken currently only
due to block_int.h always included before qcow2.h. Though, it seems
better to directly include block_int.h in qcow2.h.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190506142741.41731-2-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Max Reitz 9c3db310ff block/file-posix: Unaligned O_DIRECT block-status
Currently, qemu crashes whenever someone queries the block status of an
unaligned image tail of an O_DIRECT image:
$ echo > foo
$ qemu-img map --image-opts driver=file,filename=foo,cache.direct=on
Offset          Length          Mapped to       File
qemu-img: block/io.c:2093: bdrv_co_block_status: Assertion `*pnum &&
QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset'
failed.

This is because bdrv_co_block_status() checks that the result returned
by the driver's implementation is aligned to the request_alignment, but
file-posix can fail to do so, which is actually mentioned in a comment
there: "[...] possibly including a partial sector at EOF".

Fix this by rounding up those partial sectors.

There are two possible alternative fixes:
(1) We could refuse to open unaligned image files with O_DIRECT
    altogether.  That sounds reasonable until you realize that qcow2
    does necessarily not fill up its metadata clusters, and that nobody
    runs qemu-img create with O_DIRECT.  Therefore, unpreallocated qcow2
    files usually have an unaligned image tail.

(2) bdrv_co_block_status() could ignore unaligned tails.  It actually
    throws away everything past the EOF already, so that sounds
    reasonable.
    Unfortunately, the block layer knows file lengths only with a
    granularity of BDRV_SECTOR_SIZE, so bdrv_co_block_status() usually
    would have to guess whether its file length information is inexact
    or whether the driver is broken.

Fixing what raw_co_block_status() returns is the safest thing to do.

There seems to be no other block driver that sets request_alignment and
does not make sure that it always returns aligned values.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-20 17:08:57 +02:00
Kevin Wolf 9ff7f0df87 blockjob: Propagate AioContext change to all job nodes
Block jobs require that all of the nodes the job is using are in the
same AioContext. Therefore all BdrvChild objects of the job propagate
.(can_)set_aio_context to all other job nodes, so that the switch is
checked and performed consistently even if both nodes are in different
subtrees.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-20 17:08:56 +02:00
Kevin Wolf 980b0f943a block: Add blk_set_allow_aio_context_change()
Some users (like block jobs) can tolerate an AioContext change for their
BlockBackend. Add a function that tells the BlockBackend that it can
allow changes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-20 17:08:56 +02:00
Kevin Wolf 38475269d4 block: Implement .(can_)set_aio_ctx for BlockBackend
bdrv_try_set_aio_context() currently fails if a BlockBackend is attached
to a node because it doesn't implement the BdrvChildRole callbacks for
AioContext management.

We can allow changing the AioContext of monitor-owned BlockBackends as
long as no device is attached to them.

When setting the AioContext of the root node of a BlockBackend, we now
need to pass blk->root as an ignored child because we don't want the
root node to recursively call back into BlockBackend and execute
blk_do_set_aio_context() a second time.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-20 17:08:56 +02:00
Alberto Garcia 41ae31e3d7 block: Use BDRV_REQUEST_MAX_BYTES instead of BDRV_REQUEST_MAX_SECTORS
There are a few places in which we turn a number of bytes into sectors
in order to compare the result against BDRV_REQUEST_MAX_SECTORS
instead of using BDRV_REQUEST_MAX_BYTES directly.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-20 17:08:56 +02:00
Alberto Garcia b6c246942b qcow2: Define and use QCOW2_COMPRESSED_SECTOR_SIZE
When an L2 table entry points to a compressed cluster the space used
by the data is specified in 512-byte sectors. This size is independent
from BDRV_SECTOR_SIZE and is specific to the qcow2 file format.

The QCOW2_COMPRESSED_SECTOR_SIZE constant defined in this patch makes
this explicit.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-20 17:08:56 +02:00
Max Reitz 50ba5b2d99 block/file-posix: Truncate in xfs_write_zeroes()
XFS_IOC_ZERO_RANGE does not increase the file length:
$ touch foo
$ xfs_io -c 'zero 0 65536' foo
$ stat -c "size=%s, blocks=%b" foo
size=0, blocks=128

We do want writes beyond the EOF to automatically increase the file
length, however.  This is evidenced by the fact that iotest 061 is
broken on XFS since qcow2's check implementation checks for blocks
beyond the EOF.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-20 17:08:56 +02:00
Peter Maydell 01807c8b0e Miscellaneous patches for 2019-05-13
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJc2RbiAAoJEDhwtADrkYZTfpAP/itXg+X9wRfMeerni6SAkgtz
 knaLtJNC7YvwH7W6TIkSf2QgMrg/nYzIXxxj6V+Y3Vdn4CD93C7ldItWkm33amlA
 SEhREHpAn4F3wg/LsOGYYgpuqF/wrUcZsmzudnslfMd3mM6Q9Q6J3q6mu8n1oRcl
 RRKlk++ElqyRdvmxFhddhPxk797Vuunh76vd3ARUFmzKs2n7CGkeBu+qbk41VqI9
 YtjmWHO6BDY5b01PvjuWPir6n1yJsYcpfo3ZElZvPf5jQHMmO6fGB3SZc/PIWegq
 gAVeoXtwhNm+nywMpIv1wHQMkvRDZW0wrurIQBc4VGpH1Pa90dR9FNVZ8r0OZqPB
 aErPCdC7ED73uzJwzXKTnLxY0XDgdhsAsW7lFggANs6YyewZNcbDaVhZWsopTTK/
 3jBbddIw2RsfHNQgXlFVVzjZJGHBNHxFjAFASCKcapUWQwDKU42kQrS1GqxG56NI
 Lgi8Ce+Q0GsVF4wme96Oa/8EMRfmNvsHMfWQvmqGqA1OACSOf2PSGCeD618A5gq6
 kV6wF4v5HdGFkc0x9Vr5ur7kv3eOhpzFzBM6XJXe3CyqnYrkNuBldkyGZBbrNY7G
 aW5sR26Is4m9i+7159cNB5LmnfQqtsscibkSC0UQiXcuWgevd6cdiF+0r1YuNp7C
 Faa2yPOHs4mHCjUwade9
 =yKfZ
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-misc-2019-05-13' into staging

Miscellaneous patches for 2019-05-13

# gpg: Signature made Mon 13 May 2019 08:04:02 BST
# gpg:                using RSA key 3870B400EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-misc-2019-05-13:
  Clean up decorations and whitespace around header guards
  Normalize header guard symbol definition.
  Clean up ill-advised or unusual header guards
  Clean up header guards that don't match their file name
  target/xtensa: Clean up core-isa.h header guards
  linux-user/nios2 linux-user/riscv: Clean up header guards
  authz: Normalize #include "authz/trace.h" to "trace.h"
  Use #include "..." for our own headers, <...> for others
  Clean up includes

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-05-13 13:55:13 +01:00
Markus Armbruster a8b991b52d Clean up ill-advised or unusual header guards
Leading underscores are ill-advised because such identifiers are
reserved.  Trailing underscores are merely ugly.  Strip both.

Our header guards commonly end in _H.  Normalize the exceptions.

Done with scripts/clean-header-guards.pl.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190315145123.28030-7-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
[Changes to slirp/ dropped, as we're about to spin it off]
2019-05-13 08:58:55 +02:00
Alberto Garcia 433e8e3b22 qcow2: Remove BDRVQcow2State.cluster_sectors
The last user of this field disappeared when we replace the
sector-based bdrv_write() with the byte-based bdrv_pwrite().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-10 16:45:40 +02:00
Alberto Garcia 2e11d7562a block: Remove bdrv_read() and bdrv_write()
No one is using these functions anymore, all callers have switched to
the byte-based bdrv_pread() and bdrv_pwrite()

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-10 16:45:40 +02:00
Alberto Garcia e5a0a6784a vvfat: Replace bdrv_{read,write}() with bdrv_{pread,pwrite}()
There's only a couple of bdrv_read() and bdrv_write() calls left in
the vvfat code, and they can be trivially replaced with the byte-based
bdrv_pread() and bdrv_pwrite().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-10 16:45:40 +02:00
Alberto Garcia d4f189713f vdi: Replace bdrv_{read,write}() with bdrv_{pread,pwrite}()
There's only a couple of bdrv_read() and bdrv_write() calls left in
the vdi code, and they can be trivially replaced with the byte-based
bdrv_pread() and bdrv_pwrite().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-10 16:45:40 +02:00
Alberto Garcia e3b4257d03 qcow2: Replace bdrv_write() with bdrv_pwrite()
There's only one bdrv_write() call left in the qcow2 code, and it can
be trivially replaced with the byte-based bdrv_pwrite().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-10 16:45:40 +02:00
Andrey Shinkevich 118f99442d block/io.c: fix for the allocation failure
On a file system used by the customer, fallocate() returns an error
if the block is not properly aligned. So, bdrv_co_pwrite_zeroes()
fails. We can handle that case the same way as it is done for the
unsupported cases, namely, call to bdrv_driver_pwritev() that writes
zeroes to an image for the unaligned chunk of the block.

Suggested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1554474244-553661-1-git-send-email-andrey.shinkevich@virtuozzo.com
Message-Id: <1554474244-553661-1-git-send-email-andrey.shinkevich@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-05-10 10:53:21 +01:00
Alberto Garcia 6a63419980 commit: Use bdrv_append() in commit_start()
This function combines bdrv_set_backing_hd() and bdrv_replace_node()
so we can use it to simplify the code a bit in commit_start().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190403143748.9790-1-berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-07 17:14:21 +02:00
Max Reitz 21205c7c3b block/ssh: Implement .bdrv_dirname()
ssh_bdrv_dirname() is basically the generic bdrv_dirname(), except it
takes care not to silently chop off any query string (i.e.,
host_key_check).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Message-id: 20190225190828.17726-3-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-07 17:14:21 +02:00
Max Reitz b8c1f90118 block/ssh: Implement .bdrv_refresh_filename()
This requires some changes to keep iotests 104 and 207 working.

qemu-img info in 104 will now return a filename including the user name
and the port, which need to be filtered by adjusting REMOTE_TEST_DIR in
common.rc.  This additional information has to be marked optional,
however (which is simple as REMOTE_TEST_DIR is a regex), because
otherwise 197 and 215 would fail: They use it (indirectly) to filter
qemu-img create output which contains a backing filename they have
passed to it -- which probably does not contain a user name or port
number.

The problem in 207 is a nice one to have: qemu-img info used to return
json:{} filenames, but with this patch it returns nice plain ones.  We
now need to adjust the filtering to hide the user name (and port number
while we are at it).  The simplest way to do this is to include both in
iotests.remote_filename() so that bdrv_refresh_filename() will not
change it, and then iotests.img_info_log() will filter it correctly
automatically.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Message-id: 20190225190828.17726-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-07 17:14:21 +02:00
Andrey Shinkevich 444b82369b qcow2: discard bitmap when removed
Bitmap data may take a lot of disk space, so it's better to discard it
always.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-id: 1551346019-293202-1-git-send-email-andrey.shinkevich@virtuozzo.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[mreitz: Use the commit message proposed by Vladimir]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy 54b10010eb qcow2-refcount: don't mask corruptions under internal errors
No reasons for not reporting found corruptions as corruptions in case
of some internal errors, especially in case of just failed to fix l2
entry (and in this case, missed corruptions may influence comparing
logic, when we calculate difference between corruptions fields of two
results)

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190227131433.197063-6-vsementsov@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy cbb51e9f93 qcow2-refcount: check_refcounts_l2: don't count fixed cluster as allocated
Do not count a cluster which is fixed to be ZERO as allocated.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190227131433.197063-5-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy 1ef337b7a0 qcow2-refcount: check_refcounts_l2: reduce ignored overlaps
Reduce number of structures ignored in overlap check: when checking
active table ignore active tables, when checking inactive table ignore
inactive ones.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190227131433.197063-4-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy a5fff8d4b4 qcow2-refcount: avoid eating RAM
qcow2_inc_refcounts_imrt() (through realloc_refcount_array()) can eat
an unpredictable amount of memory on corrupted table entries, which are
referencing regions far beyond the end of file.

Prevent this, by skipping such regions from further processing.

Interesting that iotest 138 checks exactly the behavior which we fix
here. So, change the test appropriately.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190227131433.197063-3-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy 7e3e736cbd qcow2-refcount: fix check_oflag_copied
Increase corruptions_fixed only after successful fix.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190227131433.197063-2-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy 54277a2aab block/qed: add missed coroutine_fn markers
qed_read_table and qed_write_table use coroutine-only interfaces but
are not marked coroutine_fn. Happily, they are called only from
coroutine context, so we only need to add missed markers.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-04-30 15:29:00 +02:00
Alberto Garcia 065abf9f2b commit: Make base read-only if there is an early failure
You can reproduce this by passing an invalid filter-node-name (like
"1234") to block-commit. In this case the base image is put in
read-write mode but is never reset back to read-only.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy f4326aefcf block/stream: use buffer-based io
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy 08b6261f34 block/commit: use buffer-based io
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy 607dbdc4e0 block/backup: use buffer-based io
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy a4072543cc block/parallels: use buffer-based io
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy 696e8cb292 block/qed: use buffer-based io
Move to _co_ versions of io functions qed_read_table() and
qed_write_table(), as we use qemu_co_mutex_unlock()
anyway.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy 4ed3e0c486 block/qcow: use buffer-based io
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy b00cb15bda block/qcow2: use buffer-based io
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-04-30 15:29:00 +02:00
Alberto Garcia e1f4a37a49 qcow2: Fix error handling in the compression code
This patch fixes a few things in the way error codes are handled in
the qcow2 compression code:

a) qcow2_co_pwritev_compressed() expects qcow2_co_compress() to only
   return -1 or -2 on failure, but this is not correct. Since the
   change from qcow2_compress() to qcow2_co_compress() in commit
   ceb029cd6f the new code can also return -EINVAL (although
   there does not seem to exist any code path that would cause that
   error in the current implementation).

b) -1 and -2 are ad-hoc error codes defined in qcow2_compress().
   This patch replaces them with standard constants from errno.h.

c) Both qcow2_compress() and qcow2_co_do_compress() return a negative
   value on failure, but qcow2_co_pwritev_compressed() stores the
   value in an unsigned data type.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-04-30 15:29:00 +02:00
Kevin Wolf db04524f82 qcow2: Fix qcow2_make_empty() with external data file
make_completely_empty() is an optimisated path for bdrv_make_empty()
where completely new metadata is created inside the image file instead
of going through all clusters and discarding them. For an external data
file, however, we actually need to do discard operations on the data
file; just overwriting the qcow2 file doesn't get rid of the data.

The necessary slow path with an explicit discard operation already
exists for other cases. Use it for external data files, too.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2019-04-30 15:29:00 +02:00
Kevin Wolf 718c0fce2f qcow2: Fix full preallocation with external data file
preallocate_co() already gave the data file the full size without
forwarding the requested preallocation mode to the protocol. When
bdrv_co_truncate() was called later with the preallocation mode, the
file didn't actually grow any more, so the data file stayed unallocated
even if full preallocation was requested.

Pass the right preallocation mode to preallocate_co() and remove the
second bdrv_co_truncate() to fix this. As a side effect, the ugly
one-byte write in preallocate_co() is replaced with a truncate call,
now leaving the last block unallocated on the protocol level as it
should be.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2019-04-30 15:29:00 +02:00
Kevin Wolf 360bd07471 qcow2: Add errp to preallocate_co()
We'll add a bdrv_co_truncate() call in the next patch which can return
an Error that we don't want to discard. So add an errp parameter to
preallocate_co().

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2019-04-30 15:29:00 +02:00
Kevin Wolf f29fbf7c6b qcow2: Avoid COW during metadata preallocation
Limiting the allocation to INT_MAX bytes isn't particularly clever
because it means that the final cluster will be a partial cluster which
will be completed through a COW operation. This results in unnecessary
data read and write requests which lead to an unwanted non-sparse
filesystem block for metadata preallocation.

Align the maximum allocation size down to the cluster size to avoid this
situation.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2019-04-30 15:29:00 +02:00
Eric Blake de38b5005e qemu-img: Saner printing of large file sizes
Disk sizes close to INT64_MAX cause overflow, for some pretty
ridiculous output:

  $ ./nbdkit -U - memory size=$((2**63 - 512)) --run 'qemu-img info $nbd'
  image: nbd+unix://?socket=/tmp/nbdkitHSAzNz/socket
  file format: raw
  virtual size: -8388607T (9223372036854775296 bytes)
  disk size: unavailable

But there's no reason to have two separate implementations of integer
to human-readable abbreviation, where one has overflow and stops at
'T', while the other avoids overflow and goes all the way to 'E'. With
this patch, the output now claims 8EiB instead of -8388607T, which
really is the correct rounding of largest file size supported by qemu
(we could go 511 bytes larger if we used byte-accurate sizing instead
of rounding up to the next sector boundary, but that wouldn't change
the human-readable result).

Quite a few iotests need updates to expected output to match.

Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Max Reitz <mreitz@redhat.com>
2019-04-30 15:29:00 +02:00
Stefano Garzarella 0cb98af218 block/vhdx: Use IEC binary prefixes for size constants
Using IEC binary prefixes in order to make the code more readable,
with the exception of DEFAULT_LOG_SIZE because it's passed to
stringify().

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-04-30 15:29:00 +02:00
Stefano Garzarella e9991e29ea block/vhdx: Remove redundant IEC binary prefixes definition
IEC binary prefixes are already defined in "qemu/units.h",
so we can remove redundant definitions in "block/vhdx.h".

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-04-30 15:29:00 +02:00
Sam Eiderman 7502be838e vmdk: Set vmdk parent backing_format to vmdk
Commit b69864e5a ("vmdk: Support version=3 in VMDK descriptor files")
fixed the probe function to correctly guess vmdk descriptors with
version=3.

This solves the issue where vmdk snapshot with parent vmdk descriptor
containing "version=3" would be treated as raw instead vmdk.

In the future case where a new vmdk version is introduced, we will again
experience this issue, even if the user will provide "-f vmdk" it will
only apply to the tip image and not to the underlying "misprobed" parent
image.

The code in vmdk.c already assumes that the backing file of vmdk must be
vmdk (see vmdk_is_cid_valid which returns 0 if backing file is not
vmdk).

So let's make it official by supplying the backing_format as vmdk.

Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Reviewed-By: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
Signed-off-by: Shmuel Eiderman <shmuel.eiderman@oracle.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <fam@euphon.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-04-30 15:29:00 +02:00
Zhengui li 126734c4f7 vpc: unlock Coroutine lock to make IO submit Concurrently
Concurrent IO becomes serial IO because of the qemu Coroutine lock,
which reduce IO performance severely.

So unlock Coroutine lock before bdrv_co_pwritev and
bdrv_co_preadv to fix it.

Signed-off-by: Zhengui li <lizhengui@huawei.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-04-30 15:29:00 +02:00
Markus Armbruster e1ce7d747b block/qapi: Clean up how we print to monitor or stdout
bdrv_snapshot_dump(), bdrv_image_info_specific_dump(),
bdrv_image_info_dump() and their helpers take an fprintf()-like
callback and a FILE * to pass to it.

hmp.c passes monitor_printf() cast to fprintf_function and the current
monitor cast to FILE *.

qemu-img.c and qemu-io-cmds.c pass fprintf and stdout.

The type-punning is technically undefined behaviour, but works in
practice.  Clean up: drop the callback, and call qemu_printf()
instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-8-armbru@redhat.com>
2019-04-18 22:18:59 +02:00
Markus Armbruster 6b3048cee0 block/ssh: Do not report read/write/flush errors to the user
Callbacks ssh_co_readv(), ssh_co_writev(), ssh_co_flush() report
errors to the user with error_printf().  They shouldn't, it's their
caller's job.  Replace by a suitable trace point.  While there, drop
the unreachable !s->sftp case.

Perhaps we should convert this part of the block driver interface to
Error, so block drivers can pass more detail to their callers.  Not
today.

Cc: "Richard W.M. Jones" <rjones@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: qemu-block@nongnu.org
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190417190641.26814-3-armbru@redhat.com>
2019-04-17 21:21:49 +02:00
Kevin Wolf 93e32b3e20 qcow2: Fix preallocation bdrv_pwrite to wrong file
With an external data file, preallocate_co() must write the final byte
to the external data file, not to the qcow2 image file.

This is harmless for preallocation of newly created images (only the
qcow2 file size is increased to the virtual disk size while it should be
much smaller), but with preallocated resize, it could in theory cause
visible corruption if the metadata of the image is larger than the data
(e.g. lots of bitmaps).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2019-04-16 16:23:24 +02:00
Alberto Garcia 20509c4b8b block: freeze the backing chain earlier in stream_start()
Commit 6585493369 added code to freeze
the backing chain from 'top' to 'base' for the duration of the
block-stream job.

The problem is that the freezing happens too late in stream_start():
during the bdrv_reopen_set_read_only() call earlier in that function
another job can jump in and remove the base image. If that happens we
have an invalid chain and QEMU crashes.

This patch puts the bdrv_freeze_backing_chain() call at the beginning
of the function.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-04-02 12:04:44 +02:00
Vladimir Sementsov-Ogievskiy 696aaaed57 block/file-posix: do not fail on unlock bytes
bdrv_replace_child() calls bdrv_check_perm() with error_abort on
loosening permissions. However file-locking operations may fail even
in this case, for example on NFS. And this leads to Qemu crash.

Let's avoid such errors. Note, that we ignore such things anyway on
permission update commit and abort.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-04-02 12:04:44 +02:00
Stefano Garzarella de23e72bb7 block/gluster: limit the transfer size to 512 MiB
Several versions of GlusterFS (3.12? -> 6.0.1) fail when the
transfer size is greater or equal to 1024 MiB, so we are
limiting the transfer size to 512 MiB to avoid this rare issue.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1691320
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-04-02 12:04:44 +02:00
Eric Blake 75d34eb98c nbd/client: Trace server noncompliance on structured reads
Just as we recently added a trace for a server sending block status
that doesn't match the server's advertised minimum block alignment,
let's do the same for read chunks.  But since qemu 3.1 is such a
server (because it advertised 512-byte alignment, but when serving a
file that ends in data but is not sector-aligned, NBD_CMD_READ would
detect a mid-sector change between data and hole at EOF and the
resulting read chunks are unaligned), we don't want to change our
behavior of otherwise tolerating unaligned reads.

Note that even though we fixed the server for 4.0 to advertise an
actual block alignment (which gets rid of the unaligned reads at EOF
for posix files), we can still trigger it via other means:

$ qemu-nbd --image-opts driver=blkdebug,align=512,image.driver=file,image.filename=/path/to/non-aligned-file

Arguably, that is a bug in the blkdebug block status function, for
leaking a block status that is not aligned. It may also be possible to
observe issues with a backing layer with smaller alignment than the
active layer, although so far I have been unable to write a reliable
iotest for that scenario.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190330165349.32256-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-04-01 08:58:04 -05:00
Eric Blake 4841211e0d block: Add bdrv_get_request_alignment()
The next patch needs access to a device's minimum permitted
alignment, since NBD wants to advertise this to clients. Add
an accessor function, borrowing from blk_get_max_transfer()
for accessing a backend's block limits.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190329042750.14704-6-eblake@redhat.com>
2019-04-01 08:46:52 -05:00
Eric Blake 9cf638508c nbd/client: Support qemu-img convert from unaligned size
If an NBD server advertises a size that is not a multiple of a sector,
the block layer rounds up that size, even though we set info.size to
the exact byte value sent by the server. The block layer then proceeds
to let us read or query block status on the hole that it added past
EOF, which the NBD server is unlikely to be happy with. Fortunately,
qemu as a server never advertizes an unaligned size, so we generally
don't run into this problem; but the nbdkit server makes it easy to
test:

$ printf %1000d 1 > f1
$ ~/nbdkit/nbdkit -fv file f1 & pid=$!
$ qemu-img convert -f raw nbd://localhost:10809 f2
$ kill $pid
$ qemu-img compare f1 f2

Pre-patch, the server attempts a 1024-byte read, which nbdkit
rightfully rejects as going beyond its advertised 1000 byte size; the
conversion fails and the output files differ (not even the first
sector is copied, because qemu-img does not follow ddrescue's habit of
trying smaller reads to get as much information as possible in spite
of errors). Post-patch, the client's attempts to read (and query block
status, for new enough nbdkit) are properly truncated to the server's
length, with sane handling of the hole the block layer forced on
us. Although f2 ends up as a larger file (1024 bytes instead of 1000),
qemu-img compare shows the two images to have identical contents for
display to the guest.

I didn't add iotests coverage since I didn't want to add a dependency
on nbdkit in iotests. I also did NOT patch write, trim, or write
zeroes - these commands continue to fail (usually with ENOSPC, but
whatever the server chose), because we really can't write to the end
of the file, and because 'qemu-img convert' is the most common case
where we care about being tolerant (which is read-only). Perhaps we
could truncate the request if the client is writing zeros to the tail,
but that seems like more work, especially if the block layer is fixed
in 4.1 to track byte-accurate sizing (in which case this patch would
be reverted as unnecessary).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190329042750.14704-5-eblake@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
2019-04-01 08:32:44 -05:00
Eric Blake a62a85ef5c nbd/client: Report offsets in bdrv_block_status
It is desirable for 'qemu-img map' to have the same output for a file
whether it is served over file or nbd protocols. However, ever since
we implemented block status for NBD (2.12), the NBD protocol forgot to
inform the block layer that as the final layer in the chain, the
offset is valid; without an offset, the human-readable form of
qemu-img map gives up with the unhelpful:

$ nbdkit -U - data data="1" size=512 --run 'qemu-img map $nbd'
Offset          Length          Mapped to       File
qemu-img: File contains external, encrypted or compressed clusters.

The --output=json form always works, because it is reporting the
lower-level bdrv_block_status results directly rather than trying to
filter out sparse ranges for human consumption - but now it also
shows the offset member.

With this patch, the human output changes to:

Offset          Length          Mapped to       File
0               0x200           0               nbd+unix://?socket=/tmp/nbdkitOxeoLa/socket

This change is observable to several iotests.

Fixes: 78a33ab5
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190329042750.14704-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-03-30 20:52:29 -05:00
Eric Blake 7da537f70d nbd/client: Lower min_block for block-status, unaligned size
We have a latent bug in our NBD client code, tickled by the brand new
nbdkit 1.11.10 block status support:

$ nbdkit --filter=log --filter=truncate -U - \
           data data="1" size=511 truncate=64K logfile=/dev/stdout \
           --run 'qemu-img convert $nbd /var/tmp/out'
...
qemu-img: block/io.c:2122: bdrv_co_block_status: Assertion `*pnum && QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset' failed.

The culprit? Our implementation of .bdrv_co_block_status can return
unaligned block status for any server that operates with a lower
actual alignment than what we tell the block layer in
request_alignment, in violation of the block layer's constraints. To
date, we've been unable to trip the bug, because qemu as NBD server
always advertises block sizing (at which point it is a server bug if
the server sends unaligned status - although qemu 3.1 is such a server
and I've sent separate patches for 4.0 both to get the server to obey
the spec, and to let the client to tolerate server oddities at EOF).

But nbdkit does not (yet) advertise block sizing, and therefore is not
in violation of the spec for returning block status at whatever
boundaries it wants, and those unaligned results can occur anywhere
rather than just at EOF. While we are still wise to avoid sending
sub-sector read/write requests to a server of unknown origin, we MUST
consider that a server telling us block status without an advertised
block size is correct.  So, we either have to munge unaligned answers
from the server into aligned ones that we hand back to the block
layer, or we have to tell the block layer about a smaller alignment.

Similarly, if the server advertises an image size that is not
sector-aligned, we might as well assume that the server intends to let
us access those tail bytes, and therefore supports a minimum block
size of 1, regardless of whether the server supports block status
(although we still need more patches to fix the problem that with an
unaligned image, we can send read or block status requests that exceed
EOF to the server). Again, qemu as server cannot trip this problem
(because it rounds images to sector alignment), but nbdkit advertised
unaligned size even before it gained block status support.

Solve both alignment problems at once by using better heuristics on
what alignment to report to the block layer when the server did not
give us something to work with. Note that very few NBD servers
implement block status (to date, only qemu and nbdkit are known to do
so); and as the NBD spec mentioned block sizing constraints prior to
documenting block status, it can be assumed that any future
implementations of block status are aware that they must advertise
block size if they want a minimum size other than 1.

We've had a long history of struggles with picking the right alignment
to use in the block layer, as evidenced by the commit message of
fd8d372d (v2.12) that introduced the current choice of forced 512-byte
alignment.

There is no iotest coverage for this fix, because qemu can't provoke
it, and I didn't want to make test 241 dependent on nbdkit.

Fixes: fd8d372d
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190329042750.14704-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
2019-03-30 20:52:19 -05:00
Eric Blake 737d3f5244 nbd-client: Work around server BLOCK_STATUS misalignment at EOF
The NBD spec is clear that a server that advertises a minimum block
size should reply to NBD_CMD_BLOCK_STATUS with extents aligned
accordingly. However, we know that the qemu NBD server implementation
has had a corner-case bug where it is not compliant with the spec,
present since the introduction of NBD_CMD_BLOCK_STATUS in qemu 2.12
(and unlikely to be patched in time for 4.0). Namely, when qemu is
serving a file that is not a multiple of 512 bytes, it rounds the size
advertised over NBD up to the next sector boundary (someday, I'd like
to fix that to be byte-accurate, but it's a much bigger audit not
appropriate for this release); yet if the final sector contains data
prior to EOF, lseek(SEEK_HOLE) will point to the implicit hole
mid-sector which qemu then reported over NBD.

We are well within our rights to hang up on a server that can't follow
the spec, but it is more useful to try and keep the connection alive
in spite of the problem. Do so by tracing a message about the problem,
and then either truncating the request back to an aligned boundary (if
it covered more than the final sector) or widening it out to the full
boundary with a forced status of data (since truncating would result
in 0 bytes, but we have to make progress, and valid since data is a
default-safe answer). And in practice, since the problem only happens
on a sector that starts with data and ends with a hole, we are going
to want to read that full sector anyway (where qemu as the server
fills in the tail beyond EOF with appropriate NUL bytes).

Easy reproduction:
$ printf %1000d 1 > file
$ qemu-nbd -f raw -t file & pid=$!
$ qemu-img map --output=json -f raw nbd://localhost:10809
qemu-img: Could not read file metadata: Invalid argument
$ kill $pid

where the patched version instead succeeds with:
[{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true}]

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190326171317.4036-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-03-30 10:06:08 -05:00
Eric Blake ebd82cd872 nbd: Permit simple error to NBD_CMD_BLOCK_STATUS
The NBD spec is clear that when structured replies are active, a
simple error reply is acceptable to any command except for
NBD_CMD_READ.  However, we were mistakenly requiring structured errors
for NBD_CMD_BLOCK_STATUS, and hanging up on a server that gave a
simple error (since qemu does not behave as such a server, we didn't
notice the problem until now).  Broken since its introduction in
commit 78a33ab5 (v2.12).

Noticed while debugging a separate failure reported by nbdkit while
working out its initial implementation of BLOCK_STATUS, although it
turns out that nbdkit also chose to send structured error replies for
BLOCK_STATUS, so I had to manually provoke the situation by hacking
qemu's server to send a simple error reply:

| diff --git i/nbd/server.c w/nbd/server.c
| index fd013a2817a..833288d7c45 100644
| 00--- i/nbd/server.c
| +++ w/nbd/server.c
| @@ -2269,6 +2269,8 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
|                                        "discard failed", errp);
|
|      case NBD_CMD_BLOCK_STATUS:
| +        return nbd_co_send_simple_reply(client, request->handle, ENOMEM,
| +                                        NULL, 0, errp);
|          if (!request->len) {
|              return nbd_send_generic_reply(client, request->handle, -EINVAL,
|                                            "need non-zero length", errp);
|

Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20190325190104.30213-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-03-30 10:06:08 -05:00
Eric Blake b29f3a3d2a nbd: Don't lose server's error to NBD_CMD_BLOCK_STATUS
When the server replies with a (structured [*]) error to
NBD_CMD_BLOCK_STATUS, without any extent information sent first, the
client code was blindly throwing away the server's error code and
instead telling the caller that EIO occurred.  This has been broken
since its introduction in 78a33ab5 (v2.12, where we should have called:
   error_setg(&local_err, "Server did not reply with any status extents");
   nbd_iter_error(&iter, false, -EIO, &local_err);
to declare the situation as a non-fatal error if no earlier error had
already been flagged, rather than just blindly slamming iter.err and
iter.ret), although it is more noticeable since commit 7f86068d, which
actually tries hard to preserve the server's code thanks to a separate
iter.request_ret.

[*] The spec is clear that the server is also permitted to reply with
a simple error, but that's a separate fix.

I was able to provoke this scenario with a hack to the server, then
seeing whether ENOMEM makes it back to the caller:

| diff --git a/nbd/server.c b/nbd/server.c
| index fd013a2817a..29c7995de02 100644
| --- a/nbd/server.c
| +++ b/nbd/server.c
| @@ -2269,6 +2269,8 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
|                                        "discard failed", errp);
|
|      case NBD_CMD_BLOCK_STATUS:
| +        return nbd_send_generic_reply(client, request->handle, -ENOMEM,
| +                                      "no status for you today", errp);
|          if (!request->len) {
|              return nbd_send_generic_reply(client, request->handle, -EINVAL,
|                                            "need non-zero length", errp);
| --

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190325190104.30213-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-03-30 10:06:08 -05:00
Eric Blake a39286dd61 nbd: Tolerate some server non-compliance in NBD_CMD_BLOCK_STATUS
The NBD spec states that NBD_CMD_FLAG_REQ_ONE (which we currently
always use) should not reply with an extent larger than our request,
and that the server's response should be exactly one extent. Right
now, that means that if a server sends more than one extent, we treat
the server as broken, fail the block status request, and disconnect,
which prevents all further use of the block device. But while good
software should be strict in what it sends, it should be tolerant in
what it receives.

While trying to implement NBD_CMD_BLOCK_STATUS in nbdkit, we
temporarily had a non-compliant server sending too many extents in
spite of REQ_ONE. Oddly enough, 'qemu-img convert' with qemu 3.1
failed with a somewhat useful message:
  qemu-img: Protocol error: invalid payload for NBD_REPLY_TYPE_BLOCK_STATUS

which then disappeared with commit d8b4bad8, on the grounds that an
error message flagged only at the time of coroutine teardown is
pointless, and instead we should rely on the actual failed API to
report an error - in other words, the 3.1 behavior was masking the
fact that qemu-img was not reporting an error. That has since been
fixed in the previous patch, where qemu-img convert now fails with:
  qemu-img: error while reading block status of sector 0: Invalid argument

But even that is harsh.  Since we already partially relaxed things in
commit acfd8f7a to tolerate a server that exceeds the cap (although
that change was made prior to the NBD spec actually putting a cap on
the extent length during REQ_ONE - in fact, the NBD spec change was
BECAUSE of the qemu behavior prior to that commit), it's not that much
harder to argue that we should also tolerate a server that sends too
many extents.  But at the same time, it's nice to trace when we are
being tolerant of server non-compliance, in order to help server
writers fix their implementations to be more portable (if they refer
to our traces, rather than just stderr).

Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190323212639.579-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-03-30 10:06:08 -05:00
Kevin Wolf 738301e117 file-posix: Support BDRV_REQ_NO_FALLBACK for zero writes
We know that the kernel implements a slow fallback code path for
BLKZEROOUT, so if BDRV_REQ_NO_FALLBACK is given, we shouldn't call it.
The other operations we call in the context of .bdrv_co_pwrite_zeroes
should usually be quick, so no modification should be needed for them.
If we ever notice that there are additional problematic cases, we can
still make these conditional as well.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Eric Blake <eblake@redhat.com>
2019-03-26 11:37:51 +01:00
Kevin Wolf 80f5c33ff3 block: Advertise BDRV_REQ_NO_FALLBACK in filter drivers
Filter drivers that support .bdrv_co_pwrite_zeroes can safely advertise
BDRV_REQ_NO_FALLBACK because they just forward the request flags to
their child node.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Eric Blake <eblake@redhat.com>
2019-03-26 11:37:51 +01:00
Kevin Wolf fe0480d629 block: Add BDRV_REQ_NO_FALLBACK
For qemu-img convert, we want an operation that zeroes out the whole
image if this can be done efficiently, but that returns an error
otherwise so we don't write explicit zeroes and immediately overwrite
them with the real data, potentially doubling the amount of data to be
written.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Eric Blake <eblake@redhat.com>
2019-03-26 11:37:51 +01:00
Kevin Wolf 48ce986096 block: Remove error messages in bdrv_make_zero()
There is only a single caller of bdrv_make_zero(), which is qemu-img
convert. If the function fails, we just fall back to a different method
of zeroing out blocks on the target image. There is no good reason to
print error messages on stderr when the higher level operation will
actually succeed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Eric Blake <eblake@redhat.com>
2019-03-26 11:37:51 +01:00
Markus Armbruster a9779a3ab0 trace-events: Delete unused trace points
Tracked down with cleanup-trace-events.pl.  Funnies requiring manual
post-processing:

* block.c and blockdev.c trace points are in block/trace-events.

* hw/block/nvme.c uses the preprocessor to hide its trace point use
  from cleanup-trace-events.pl.

* include/hw/xen/xen_common.h trace points are in hw/xen/trace-events.

* net/colo-compare and net/filter-rewriter.c use pseudo trace points
  colo_compare_udp_miscompare and colo_filter_rewriter_debug to guard
  debug code.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20190314180929.27722-5-armbru@redhat.com
Message-Id: <20190314180929.27722-5-armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-03-22 16:18:07 +00:00
Markus Armbruster 500016e5db trace-events: Shorten file names in comments
We spell out sub/dir/ in sub/dir/trace-events' comments pointing to
source files.  That's because when trace-events got split up, the
comments were moved verbatim.

Delete the sub/dir/ part from these comments.  Gets rid of several
misspellings.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190314180929.27722-3-armbru@redhat.com
Message-Id: <20190314180929.27722-3-armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-03-22 16:18:07 +00:00
Alberto Garcia 782b9d06bf block: Make bdrv_{copy_on_read,crypto_luks,replication} static
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-19 15:49:29 +01:00
Sam Eiderman b69864e5a8 vmdk: Support version=3 in VMDK descriptor files
Commit 509d39aa22 added support for read
only VMDKs of version 3.

This commit fixes the probe function to correctly handle descriptors of
version 3.

This commit has two effects:
    1. We no longer need to supply '-f vmdk' when pointing to descriptor
       files of version 3 in qemu/qemu-img command line arguments.
    2. This fixes the scenario where a VMDK points to a parent version 3
       descriptor file which is being probed as "raw" instead of "vmdk".

Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Shmuel Eiderman <shmuel.eiderman@oracle.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-19 15:49:29 +01:00
Kevin Wolf a0cf83639c qcow2: Fix data file error condition in qcow2_co_create()
We were trying to check whether bdrv_open_blockdev_ref() returned
success, but accidentally checked the wrong variable. Spotted by
Coverity (CID 1399703).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2019-03-19 15:49:29 +01:00
Sergio Lopez 5e771752a1 mirror: Confirm we're quiesced only if the job is paused or cancelled
While child_job_drained_begin() calls to job_pause(), the job doesn't
actually transition between states until it runs again and reaches a
pause point. This means bdrv_drained_begin() may return with some jobs
using the node still having 'busy == true'.

As a consequence, block_job_detach_aio_context() may get into a
deadlock, waiting for the job to be actually paused, while the coroutine
servicing the job is yielding and doesn't get the opportunity to get
scheduled again. This situation can be reproduced by issuing a
'block-commit' immediately followed by a 'device_del'.

To ensure bdrv_drained_begin() only returns when the jobs have been
paused, we change mirror_drained_poll() to only confirm it's quiesced
when job->paused == true and there aren't any in-flight requests, except
if we reached that point by a drained section initiated by the
mirror/commit job itself.

The other block jobs shouldn't need any changes, as the default
drained_poll() behavior is to only confirm it's quiesced if the job is
not busy or completed.

Signed-off-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-19 15:49:29 +01:00
Peter Maydell dbbc277510 Pull request
* Add 'drop-cache=on|off' option to file-posix.c.  The default is on.
    Disabling the option fixes a QEMU 3.0.0 performance regression when live
    migrating on the same host with cache.direct=off.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJciOSEAAoJEJykq7OBq3PIVSUIAI6r2Mgoi+no4nle8Jf2nZ+W
 EnQXnNEFyJA0lKRtqQ2UILD9udVdKd/L1PZu5k/Il/Ralto9Yf3+62brekI7rsss
 c3Qusu4LUK6jom2RslRjRIaJ9GilQi/jWezKV/O0VlcsMVemgVHX008EIR+ea1U4
 H0/u2kfu04PciKQ5MR2+6aacu9bfmyH1yM2no+aMN5dDu/38PV6JEsf0Zl2agowg
 opGepJ7YiDQsxH9IBXrbfm38mBrrY0K2vFzAb9BzTHfBPotGMNIZNJNM2FChRfoM
 sTjOIpZz3NDwPEUPQPZxp+7YKRFFYfse1oHtpyh4n1rMQksB019SCGlP9TBhrF0=
 =CH5G
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

Pull request

 * Add 'drop-cache=on|off' option to file-posix.c.  The default is on.
   Disabling the option fixes a QEMU 3.0.0 performance regression when live
   migrating on the same host with cache.direct=off.

# gpg: Signature made Wed 13 Mar 2019 11:07:48 GMT
# gpg:                using RSA key 9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request:
  file-posix: add drop-cache=on|off option

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-14 09:34:51 +00:00
Peter Maydell 523a2a42c3 Pull request
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+ber27ys35W+dsvQfe+BBqr8OQ4FAlyIFSwACgkQfe+BBqr8
 OQ7wghAAm16eCEr57oTO7QXR3y8uVFsKqXBn9cNH6nbrFp2PUQSglwMDKBls1Z5m
 olF23X/JaqSlSmkL9BBuzDZ6Up+kkHKuxPq4/5RKXfiDI0pr3R0eqts0COAlaN9q
 Bew3ipj99m8gzMi2093AW4+Ob0N3658fuDTGLe1M1Uoy7CEg1QJ7rVOBBEui7vIl
 RbZ8l/Zmb4ldNpB3lnE4Nh9ue8fy0RAj3Nai161nCnNeXNF/VzD3Ye8bojSBbnux
 PIMX6/RWmykX4feIf9QP8apDpxX4HkyuPq5EdwT9PD8PwdyXPAXZtsYUNCuNtQuk
 n5VKFVgFYgqUclBeVHmrMYPU4K4iCFQp4/Fua7wzPEC0iG05NiiDv91oVkEJCp3L
 ManHeuGfNLCcXaIntKZhuJl1cK8yMM3yDww6/pPTehrPjcyvKa0NOqhQBExektcD
 R6q7maJRzFaxSxdcs+Zzuog9zESvH1mlJxQCKzeYhAP0kkxInyTELE/Vbx37xuqR
 RFfZYyVQ6x87Q/sxHx4EMiV97WUM8elZOQdSEC/okt5WUUNpgIu0WF9nSQ1VKZ8C
 CZmv5xh9ogfwvB/kOm6IVwNkLvVagJQcLwddORI5LLXLbSIUcuwVSuyMp/7iDtQ/
 hnHkGs2mIJ2JUYbSSNsSJNs6oTurn8eTFCeGoYKJgd9l4QxaThU=
 =ekU+
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging

Pull request

# gpg: Signature made Tue 12 Mar 2019 20:23:08 GMT
# gpg:                using RSA key F9B7ABDBBCACDF95BE76CBD07DEF8106AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" [full]
# Primary key fingerprint: FAEB 9711 A12C F475 812F  18F2 88A9 064D 1835 61EB
#      Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76  CBD0 7DEF 8106 AAFC 390E

* remotes/jnsnow/tags/bitmaps-pull-request: (22 commits)
  tests/qemu-iotests: add bitmap resize test 246
  block/qcow2-bitmap: Allow resizes with persistent bitmaps
  block/qcow2-bitmap: Don't check size for IN_USE bitmap
  docs/interop/qcow2: Improve bitmap flag in_use specification
  bitmaps: Fix typo in function name
  block/dirty-bitmaps: implement inconsistent bit
  block/dirty-bitmaps: disallow busy bitmaps as merge source
  block/dirty-bitmaps: prohibit removing readonly bitmaps
  block/dirty-bitmaps: prohibit readonly bitmaps for backups
  block/dirty-bitmaps: add block_dirty_bitmap_check function
  block/dirty-bitmap: add inconsistent status
  block/dirty-bitmaps: add inconsistent bit
  iotests: add busy/recording bit test to 124
  blockdev: remove unused paio parameter documentation
  block/dirty-bitmaps: move comment block
  block/dirty-bitmaps: unify qmp_locked and user_locked calls
  block/dirty-bitmap: explicitly lock bitmaps with successors
  nbd: change error checking order for bitmaps
  block/dirty-bitmap: change semantics of enabled predicate
  block/dirty-bitmap: remove set/reset assertions against enabled bit
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

# Conflicts:
#	tests/qemu-iotests/group
2019-03-13 17:30:34 +00:00
Stefan Hajnoczi f357fcd890 file-posix: add drop-cache=on|off option
Commit dd577a26ff ("block/file-posix:
implement bdrv_co_invalidate_cache() on Linux") introduced page cache
invalidation so that cache.direct=off live migration is safe on Linux.

The invalidation takes a significant amount of time when the file is
large and present in the page cache.  Normally this is not the case for
cross-host live migration but it can happen when migrating between QEMU
processes on the same host.

On same-host migration we don't need to invalidate pages for correctness
anyway, so an option to skip page cache invalidation is useful.  I
investigated optimizing invalidation and detecting same-host migration,
but both are hard to achieve so a user-visible option will suffice.

As a bonus this option means that the cache invalidation feature will
now be detectable by libvirt via QMP schema introspection.

Suggested-by: Neil Skrypuch <neil@tembosocial.com>
Tested-by: Neil Skrypuch <neil@tembosocial.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190307164941.3322-1-stefanha@redhat.com
Message-Id: <20190307164941.3322-1-stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-03-13 10:54:55 +00:00
Alberto Garcia 5019aece2a block: Remove the AioContext parameter from bdrv_reopen_multiple()
This parameter has been unused since 1a63a90750

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:14 +01:00
Alberto Garcia 8a2ce0bc1e block: Add a 'mutable_opts' field to BlockDriver
If we reopen a BlockDriverState and there is an option that is present
in bs->options but missing from the new set of options then we have to
return an error unless the driver is able to reset it to its default
value.

This patch adds a new 'mutable_opts' field to BlockDriver. This is
a list of runtime options that can be modified during reopen. If an
option in this list is unspecified on reopen then it must be reset (or
return an error).

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:14 +01:00
Alberto Garcia 077e8e2018 block: Add 'keep_old_opts' parameter to bdrv_reopen_queue()
The bdrv_reopen_queue() function is used to create a queue with
the BDSs that are going to be reopened and their new options. Once
the queue is ready bdrv_reopen_multiple() is called to perform the
operation.

The original options from each one of the BDSs are kept, with the new
options passed to bdrv_reopen_queue() applied on top of them.

For "x-blockdev-reopen" we want a function that behaves much like
"blockdev-add". We want to ignore the previous set of options so that
only the ones actually specified by the user are applied, with the
rest having their default values.

One of the things that we need is a way to tell bdrv_reopen_queue()
whether we want to keep the old set of options or not, and that's what
this patch does. All current callers are setting this new parameter to
true and x-blockdev-reopen will set it to false.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:14 +01:00
Alberto Garcia 6585493369 block: Freeze the backing chain for the duration of the stream job
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:14 +01:00
Alberto Garcia ef53dc09ed block: Freeze the backing chain for the duration of the mirror job
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:14 +01:00
Alberto Garcia df827336ab block: Freeze the backing chain for the duration of the commit job
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:14 +01:00
Kevin Wolf 23dece19da file-posix: Make auto-read-only dynamic
Until now, with auto-read-only=on we tried to open the file read-write
first and if that failed, read-only was tried. This is actually not good
enough for libvirt, which gives QEMU SELinux permissions for read-write
only as soon as it actually intends to write to the image. So we need to
be able to switch between read-only and read-write at runtime.

This patch makes auto-read-only dynamic, i.e. the file is opened
read-only as long as no user of the node has requested write
permissions, but it is automatically reopened read-write as soon as the
first writer is attached. Conversely, if the last writer goes away, the
file is reopened read-only again.

bs->read_only is no longer set for auto-read-only=on files even if the
file descriptor is opened read-only because it will be transparently
upgraded as soon as a writer is attached. This changes the output of
qemu-iotests 232.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:14 +01:00
Kevin Wolf 6ceabe6f77 file-posix: Prepare permission code for fd switching
In order to be able to dynamically reopen the file read-only or
read-write, depending on the users that are attached, we need to be able
to switch to a different file descriptor during the permission change.

This interacts with reopen, which also creates a new file descriptor and
performs permission changes internally. In this case, the permission
change code must reuse the reopen file descriptor instead of creating a
third one.

In turn, reopen can drop its code to copy file locks to the new file
descriptor because that is now done when applying the new permissions.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:14 +01:00
Kevin Wolf a6aeca0ca5 file-posix: Lock new fd in raw_reopen_prepare()
There is no reason why we can take locks on the new file descriptor only
in raw_reopen_commit() where error handling isn't possible any more.
Instead, we can already do this in raw_reopen_prepare().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:14 +01:00
Kevin Wolf e0c9cf3a48 file-posix: Store BDRVRawState.reopen_state during reopen
We'll want to access the file descriptor in the reopen_state while
processing permission changes in the context of the repoen.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:14 +01:00
Kevin Wolf 5cec287025 file-posix: Factor out raw_reconfigure_getfd()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:14 +01:00
Vladimir Sementsov-Ogievskiy cb8aac3783 qapi: drop x- from x-block-latency-histogram-set
Drop x- and x_ prefixes for latency histograms and update version to
4.0

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 20:30:08 +01:00
John Snow d19c6b36ff block/qcow2-bitmap: Allow resizes with persistent bitmaps
Since we now load all bitmaps into memory anyway, we can just truncate
them in-memory and then flush them back to disk. Just in case, we will
still check and enforce that this shortcut is valid -- i.e. that any
bitmap described on-disk is indeed in-memory and can be modified.

If there are any inconsistent bitmaps, we refuse to allow the truncate
as we do not actually load these bitmaps into memory, and it isn't safe
or reasonable to attempt to truncate corrupted data.

Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190311185147.52309-4-vsementsov@virtuozzo.com
  [vsementsov: drop bitmap flushing, fix block comments style]
Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12 14:57:38 -04:00
Vladimir Sementsov-Ogievskiy bf5f0cf5d8 block/qcow2-bitmap: Don't check size for IN_USE bitmap
We are going to allow image resize when there are persistent bitmaps.
It may lead to appearing of inconsistent bitmaps (IN_USE=1) with
inconsistent size. But we still want to load them as inconsistent.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190311185147.52309-3-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12 14:50:28 -04:00
Eric Blake 796a3798ab bitmaps: Fix typo in function name
Commit a88b179f introduced the ability to set and query bitmap
persistence, but with an atypical spelling.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20190308205845.25734-1-eblake@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12 12:05:49 -04:00
John Snow 74da6b9435 block/dirty-bitmaps: implement inconsistent bit
Set the inconsistent bit on load instead of rejecting such bitmaps.
There is no way to un-set it; the only option is to delete the bitmap.

Obvervations:
- bitmap loading does not need to update the header for in_use bitmaps.
- inconsistent bitmaps don't need to have their data loaded; they're
  glorified corruption sentinels.
- bitmap saving does not need to save inconsistent bitmaps back to disk.
- bitmap reopening DOES need to drop the readonly flag from inconsistent
  bitmaps to allow reopening of qcow2 files with non-qemu-owned bitmaps
  being eventually flushed back to disk.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20190301191545.8728-8-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12 12:05:49 -04:00
John Snow cb8e58e3de block/dirty-bitmaps: disallow busy bitmaps as merge source
We didn't do any state checking on source bitmaps at all,
so this adds inconsistent and busy checks. readonly is
allowed, so you can still copy a readonly bitmap to a new
destination to use it for operations like drive-backup.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190301191545.8728-7-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12 12:05:49 -04:00
John Snow 3ae96d6684 block/dirty-bitmaps: add block_dirty_bitmap_check function
Instead of checking against busy, inconsistent, or read only directly,
use a check function with permissions bits that let us streamline the
checks without reproducing them in many places.

Included in this patch are permissions changes that simply add the
inconsistent check to existing permissions call spots, without
addressing existing bugs.

In general, this means that busy+readonly checks become BDRV_BITMAP_DEFAULT,
which checks against all three conditions. busy-only checks become
BDRV_BITMAP_ALLOW_RO.

Notably, remove allows inconsistent bitmaps, so it doesn't follow the pattern.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190301191545.8728-4-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12 12:05:49 -04:00
John Snow 0064cfefa4 block/dirty-bitmap: add inconsistent status
Even though the status field is deprecated, we still have to support
it for a few more releases. Since this is a very new kind of bitmap
state, it makes sense for it to have its own status field.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190301191545.8728-3-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12 12:05:49 -04:00
John Snow b0f455599d block/dirty-bitmaps: add inconsistent bit
Add an inconsistent bit to dirty-bitmaps that allows us to report a bitmap as
persistent but potentially inconsistent, i.e. if we find bitmaps on a qcow2
that have been marked as "in use".

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190301191545.8728-2-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12 12:05:49 -04:00
John Snow 1e6fddcd6f block/dirty-bitmaps: move comment block
Simply move the big status enum comment block to above the status
function, and document it as being deprecated. The whole confusing
block can get deleted in three releases time.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190223000614.13894-9-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12 12:05:48 -04:00
John Snow 27a1b301a4 block/dirty-bitmaps: unify qmp_locked and user_locked calls
These mean the same thing now. Unify them and rename the merged call
bdrv_dirty_bitmap_busy to indicate semantically what we are describing,
as well as help disambiguate from the various _locked and _unlocked
versions of bitmap helpers that refer to mutex locks.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190223000614.13894-8-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12 12:05:48 -04:00
John Snow 21d2376f26 block/dirty-bitmap: explicitly lock bitmaps with successors
Instead of implying a user_locked/busy status, make it explicit.
Now, bitmaps in use by migration, NBD or backup operations
are all treated the same way with the same code paths.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190223000614.13894-7-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12 12:05:48 -04:00
John Snow 8b2e20f64f block/dirty-bitmap: change semantics of enabled predicate
Currently, the enabled predicate means something like:
"the QAPI status of the bitmap is ACTIVE."
After this patch, it should mean exclusively:
"This bitmap is recording guest writes, and is allowed to do so."

In many places, this is how this predicate was already used.
Internal usages of the bitmap QPI can call user_locked to find out if
the bitmap is in use by an operation.

To accommodate this, modify the create_successor routine to now
explicitly disable the parent bitmap at creation time.

Justifications:

1. bdrv_dirty_bitmap_status suffers no change from the lack of
   1:1 parity with the new predicates because of the order in which
   the predicates are checked. This is now only for compatibility.

2. bdrv_set_dirty() is unchanged: pre-patch, it was skipping bitmaps that were
   disabled or had a successor, while post-patch it is only skipping bitmaps
   that are disabled. To accommodate this, create_successor now ensures that
   any bitmap with a successor is explicitly disabled.

3. qcow2_store_persistent_dirty_bitmaps: No functional change. This function
   cares only about the literal enabled bit, and makes no effort to check if
   the bitmap is in-use or not. After this patch there are still no ways to
   produce an enabled bitmap with a successor.

4. block_dirty_bitmap_enable_prepare
   block_dirty_bitmap_disable_prepare
   init_dirty_bitmap_migration
   nbd_export_new

   These functions care about the literal enabled bit,
   and already check user_locked separately.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190223000614.13894-5-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12 12:05:48 -04:00
John Snow c28ddbb07e block/dirty-bitmap: remove set/reset assertions against enabled bit
bdrv_set_dirty_bitmap and bdrv_reset_dirty_bitmap are only used as an
internal API by the mirror and migration areas of our code. These
calls modify the bitmap, but do so at the behest of QEMU and not the
guest.

Presently, these bitmaps are always "enabled" anyway, but there's no
reason they have to be.

Modify these internal APIs to drop this assertion.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190223000614.13894-4-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12 12:05:48 -04:00
John Snow 50a47257f8 block/dirty-bitmaps: rename frozen predicate helper
"Frozen" was a good description a long time ago, but it isn't adequate now.
Rename the frozen predicate to has_successor to make the semantics of the
predicate more clear to outside callers.

In the process, remove some calls to frozen() that no longer semantically
make sense. For bdrv_enable_dirty_bitmap_locked and
bdrv_disable_dirty_bitmap_locked, it doesn't make sense to prohibit QEMU
internals from performing this action when we only wished to prohibit QMP
users from issuing these commands. All of the QMP API commands for bitmap
manipulation already check against user_locked() to prohibit these actions.

Several other assertions really want to check that the bitmap isn't in-use
by another operation -- use the bitmap_user_locked function for this instead,
which presently also checks for has_successor. This leaves some redundant
checks of has_successor through different helpers that are addressed in
forthcoming patches.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190223000614.13894-3-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12 12:05:48 -04:00
John Snow 4db6ceb0b5 block/dirty-bitmap: add recording and busy properties
The current API allows us to report a single status, which we've defined as:

Frozen: has a successor, treated as qmp_locked, may or may not be enabled.
Locked: no successor, qmp_locked. may or may not be enabled.
Disabled: Not frozen or locked, disabled.
Active: Not frozen, locked, or disabled.

The problem is that both "Frozen" and "Locked" mean nearly the same thing,
and that both of them do not intuit whether they are recording guest writes
or not.

This patch deprecates that status field and introduces two orthogonal
properties instead to replace it.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190223000614.13894-2-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12 12:05:48 -04:00
Niels de Vos 0e3b891fef gluster: the glfs_io_cbk callback function pointer adds pre/post stat args
The glfs_*_async() functions do a callback once finished. This callback
has changed its arguments, pre- and post-stat structures have been
added. This makes it possible to improve caching, which is useful for
Samba and NFS-Ganesha, but not so much for QEMU. Gluster 6 is the first
release that includes these new arguments.

With an additional detection in ./configure, the new arguments can
conditionally get included in the glfs_io_cbk handler.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 14:26:49 +01:00
Prasanna Kumar Kalever e014dbe74e gluster: Handle changed glfs_ftruncate signature
New versions of Glusters libgfapi.so have an updated glfs_ftruncate()
function that returns additional 'struct stat' structures to enable
advanced caching of attributes. This is useful for file servers, not so
much for QEMU. Nevertheless, the API has changed and needs to be
adopted.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12 14:26:49 +01:00
Philippe Mathieu-Daudé d4cef0c67c block/iscsi: Restrict Linux-specific code
Some Linux specific code is missing guards, leading to
build failure on OSX:

  $ sudo brew install libiscsi
  $ ./configure && make
  [...]
    CC      block/iscsi.o
  qemu/block/iscsi.c:338:24: error: 'iscsi_aiocb_info' defined but not used [-Werror=unused-const-variable=]
   static const AIOCBInfo iscsi_aiocb_info = {
                          ^~~~~~~~~~~~~~~~
  qemu/block/iscsi.c:168:1: error: 'iscsi_schedule_bh' defined but not used [-Werror=unused-function]
   iscsi_schedule_bh(IscsiAIOCB *acb)
   ^~~~~~~~~~~~~~~~~
  cc1: all warnings being treated as errors

Add guards to restrict this code for Linux.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190220000553.28438-1-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-11 16:33:49 +01:00
Kevin Wolf 6c3944dc62 qcow2: Implement data-file-raw create option
Provide an option to force QEMU to always keep the external data file
consistent as a standalone read-only raw image.

At the moment, this means making sure that write_zeroes requests are
forwarded to the data file instead of just updating the metadata, and
checking that no backing file is used.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:46 +01:00
Kevin Wolf 9b890bdcb6 qcow2: Store data file name in the image
Rather than requiring that the external data file node is passed
explicitly when creating the qcow2 node, store the filename in the
designated header extension during .bdrv_create and read it from there
as a default during .bdrv_open.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:46 +01:00
Kevin Wolf dcc98687f8 qcow2: Creating images with external data file
This adds a .bdrv_create option to use an external data file.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:46 +01:00
Kevin Wolf 0e8c08be27 qcow2: Add basic data-file infrastructure
This adds a .bdrv_open option to specify the external data file node.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:46 +01:00
Kevin Wolf e9f5b6deaa qcow2: Support external data file in qemu-img check
For external data files, data clusters must be excluded from the
refcount calculations. Instead, an implicit refcount of 1 is assumed for
the COPIED flag.

Compressed clusters and internal snapshots are incompatible with
external data files, so print an error if they are in use for images
with an external data file.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:46 +01:00
Kevin Wolf aa8b34c1b2 qcow2: Return error for snapshot operation with data file
Internal snapshots and an external data file are incompatible because
snapshots require refcounting and non-linear mapping. Return an error
for all of the snapshot operations if an external data file is in use.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:46 +01:00
Kevin Wolf 966b000f49 qcow2: External file I/O
This changes the qcow2 implementation to direct all guest data I/O to
s->data_file rather than bs->file, while metadata I/O still uses
bs->file. At the moment, this is still always the same, but soon we'll
add options to set s->data_file to an external data file.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:46 +01:00
Kevin Wolf 37be14036b qcow2: Prepare qcow2_co_block_status() for data file
Offset 0 cannot be assumed to mean an unallocated cluster any more.
Instead, the cluster type needs to be checked.

*file must refer to the data file instead of the image file if a valid
offset is returned from qcow2_co_block_status().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:45 +01:00
Kevin Wolf 77e023ff79 qcow2: Return 0/-errno in qcow2_alloc_compressed_cluster_offset()
qcow2_alloc_compressed_cluster_offset() used to return the cluster
offset for success and 0 for error. This doesn't only conflict with 0 as
a valid host offset, but also loses the error code.

Similar to the change made to qcow2_alloc_cluster_offset() for
uncompressed clusters in commit 148da7ea9d, make the function return
0/-errno and return the allocated cluster offset in a by-reference
parameter.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:45 +01:00
Kevin Wolf c6d619cc12 qcow2: Don't assume 0 is an invalid cluster offset
The cluster allocation code uses 0 as an invalid offset that is used in
case of errors or as "offset not yet determined". With external data
files, a host cluster offset of 0 becomes valid, though.

Define a constant INV_OFFSET (which is not cluster aligned and will
therefore never be a valid offset) that can be used for such purposes.

This removes the additional host_offset == 0 check that commit
ff52aab2df introduced; the confusion between an invalid offset and
(erroneous) allocation at offset 0 is removed with this change.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:45 +01:00
Kevin Wolf b8c8353a38 qcow2: Prepare count_contiguous_clusters() for external data file
Offset 0 can be valid for normal (allocated) clusters now, so use
qcow2_get_cluster_type() instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:45 +01:00
Kevin Wolf a4ea184d8a qcow2: Prepare qcow2_get_cluster_type() for external data file
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:45 +01:00
Kevin Wolf 808c2bb4c4 qcow2: Pass bs to qcow2_get_cluster_type()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:45 +01:00
Kevin Wolf 93c2493646 qcow2: Basic definitions for external data files
This adds basic constants, struct fields and helper function for
external data file support to the implementation.

QCOW2_INCOMPAT_MASK and QCOW2_AUTOCLEAR_MASK are not updated yet so that
opening images with an external data file still fails (we don't handle
them correctly yet).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:45 +01:00
Kevin Wolf c5e86ebc11 qcow2: Simplify preallocation code
Image creation already involves a bdrv_co_truncate() call, which allows
to specify a preallocation mode. Just pass the right mode there and
remove the code that is made redundant by this.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:45 +01:00
Alberto Garcia af39bd0d9a qcow2: Default to 4KB for the qcow2 cache entry size
QEMU 2.12 (commit 1221fe6f63) introduced
a new setting called l2-cache-entry-size that allows making entries on
the qcow2 L2 cache smaller than the cluster size.

I have been performing several tests with different cluster and entry
sizes and all of them show that reducing the entry size (aka L2 slice)
consistently improves I/O performance, notably during random I/O (all
tests done with sequential I/O show similar results). This is to be
expected because loading and evicting an L2 slice is more expensive
the larger the slice is.

Here are some numbers on fully populated 40GB qcow2 images. The
rightmost column represents the maximum L2 cache size in both cases.

   Cluster size = 64 KB
   |-------------+--------------+--------------+--------------|
   |             | 1MB L2 cache | 3MB L2 cache | 5MB L2 cache |
   |-------------+--------------+--------------+--------------|
   |  4KB slices |    6545 IOPS |   12045 IOPS |   55680 IOPS |
   | 16KB slices |    5177 IOPS |    9798 IOPS |   56278 IOPS |
   | 64KB slices |    2718 IOPS |    5326 IOPS |   57355 IOPS |
   |-------------+--------------+--------------+--------------|

   Cluster size = 256 KB
   |--------------+----------------+--------------+-----------------|
   |              | 512KB L2 cache | 1MB L2 cache | 1280KB L2 cache |
   |--------------+----------------+--------------+-----------------|
   |   4KB slices |      8539 IOPS |   21071 IOPS |      55417 IOPS |
   |  64KB slices |      3598 IOPS |    9772 IOPS |      57687 IOPS |
   | 256KB slices |      1415 IOPS |    4120 IOPS |      58001 IOPS |
   |--------------+----------------+--------------+-----------------|

As can be seen in the numbers, the only exception to the rule is when
the cache is large enough to hold all L2 tables. This is also to be
expected because in this case no cache entry is ever evicted so
reducing its size doesn't bring any benefit.

This patch sets the default L2 cache entry size to 4KB except when the
cache is large enough for the whole disk.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:45 +01:00
Peter Maydell adf2e451f3 Block layer patches:
- Block graph change fixes (avoid loops, cope with non-tree graphs)
 - bdrv_set_aio_context() related fixes
 - HMP snapshot commands: Use only tag, not the ID to identify snapshots
 - qmeu-img, commit: Error path fixes
 - block/nvme: Build fix for gcc 9
 - MAINTAINERS updates
 - Fix various issues with bdrv_refresh_filename()
 - Fix various iotests
 - Include LUKS overhead in qemu-img measure for qcow2
 - A fix for vmdk's image creation interface
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJcc/knAAoJEH8JsnLIjy/WptQP/3F8Lh52H4egXaP7NUUuDjQM
 AhqhuDAp/EZBS+xim9kLTogNJADe/rMWdSX/YB5aLpSPYbjasC66NgaLhd6QewgQ
 VIcsLUdlYAyZ5ZjJytimfMTLwm1X02RmVIe55y52DTY8LlfViZzOlf3qwqPm00ao
 EJB2cl8UJLM+PVEu59cCw3R0/06LY+WIJRB32d3tnCBRTkaJwfR9h4lrp/juVcFZ
 U+2eWU68KMbUHSYiWANowN+KRV3uPY4HVA98v3F0vDmcBxlVHOeBg6S+PcT7tK8p
 huzCMwcdwUyPMJgVs/+WBtUnbG0jN6SHUYmFLz859UMVgBnCw5tzBMf8qw1wOA4A
 Iw+zor27Pxj4IlxcLPp5f97YZ8k9acdMR2VKPH6xLJZ1JF+sKa54RfzESd5EJeIj
 Mfcp773H0lIaWcFJ6RY1F0L1E1ta7QigwNBiWMdYfh0a0EWHnDvGyYeaSPYEQ+rl
 e8bZOcfrYwVI7DTDiZOIkGA9D8DXEPDNp+sl6s1DxeY69D0NNaXTtCPqFNNAbFbd
 20uD7yDRZlWq32cQB/K9D5cSkZRSOzdUpLfLU3nQU2+dz11x6OpM6m7DVboSrztD
 1HtPPDzDEvH5dOP7ibd60s+ntjkSiNfNkUgnuVrBE/d/PocC1eHHpZt5V7f43Ofb
 RxVwH5+smzQ9nsNBfQR0
 =gaah
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- Block graph change fixes (avoid loops, cope with non-tree graphs)
- bdrv_set_aio_context() related fixes
- HMP snapshot commands: Use only tag, not the ID to identify snapshots
- qmeu-img, commit: Error path fixes
- block/nvme: Build fix for gcc 9
- MAINTAINERS updates
- Fix various issues with bdrv_refresh_filename()
- Fix various iotests
- Include LUKS overhead in qemu-img measure for qcow2
- A fix for vmdk's image creation interface

# gpg: Signature made Mon 25 Feb 2019 14:18:15 GMT
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (71 commits)
  iotests: Skip 211 on insufficient memory
  vmdk: false positive of compat6 with hwversion not set
  iotests: add LUKS payload overhead to 178 qemu-img measure test
  qcow2: include LUKS payload overhead in qemu-img measure
  iotests.py: s/_/-/g on keys in qmp_log()
  iotests: Let 045 be run concurrently
  iotests: Filter SSH paths
  iotests.py: Filter filename in any string value
  iotests.py: Add is_str()
  iotests: Fix 207 to use QMP filters for qmp_log
  iotests: Fix 232 for LUKS
  iotests: Remove superfluous rm from 232
  iotests: Fix 237 for Python 2.x
  iotests: Re-add filename filters
  iotests: Test json:{} filenames of internal BDSs
  block: BDS options may lack the "driver" option
  block/null: Generate filename even with latency-ns
  block/curl: Implement bdrv_refresh_filename()
  block/curl: Harmonize option defaults
  block/nvme: Fix bdrv_refresh_filename()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-02-26 19:04:47 +00:00
yuchenlin 26c9296c31 vmdk: false positive of compat6 with hwversion not set
In vmdk_co_create_opts, when it finds hw_version is undefined, it will
set it to 4, which misleading the compat6 and hwversion in
vmdk_co_do_create. Simply set hw_version to NULL after free, let
the logic in vmdk_co_do_create to decide the value of hw_version.

This bug can be reproduced by:

$ qemu-img convert -O vmdk -o subformat=streamOptimized,compat6
/home/yuchenlin/syno.qcow2 /home/yuchenlin/syno.vmdk

qemu-img: /home/yuchenlin/syno.vmdk: error while converting vmdk:
compat6 cannot be enabled with hwversion set

Signed-off-by: yuchenlin <yuchenlin@synology.com>
Message-id: 20190221110805.28239-1-yuchenlin@synology.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:28 +01:00
Stefan Hajnoczi 61914f8906 qcow2: include LUKS payload overhead in qemu-img measure
LUKS encryption reserves clusters for its own payload data.  The size of
this area must be included in the qemu-img measure calculation so that
we arrive at the correct minimum required image size.

(Ab)use the qcrypto_block_create() API to determine the payload
overhead.  We discard the payload data that qcrypto thinks will be
written to the image.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190218104525.23674-2-stefanha@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:28 +01:00
Max Reitz 1e47cb7f52 block/null: Generate filename even with latency-ns
While we cannot represent the latency-ns option in a filename, it is not
a strong option so not being able to should not stop us from generating
a filename nonetheless.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-30-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:27 +01:00
Max Reitz 937c007b6e block/curl: Implement bdrv_refresh_filename()
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-29-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:27 +01:00
Max Reitz 712b64e8f3 block/curl: Harmonize option defaults
Both of the defaults we currently have in the curl driver are named
based on a slightly different schema, let's unify that and call both
CURL_BLOCK_OPT_${NAME}_DEFAULT.

While at it, we can add a macro for the third option for which a default
exists, namely "sslverify".

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-28-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:27 +01:00
Max Reitz cc61b0740f block/nvme: Fix bdrv_refresh_filename()
Currently, nvme's bdrv_refresh_filename() is an exact copy of null's
implementation.  However, for null, "null-co://" and "null-aio://" are
indeed valid filenames -- for nvme, they are not, as a device address is
still required.

The correct implementation should generate a filename of the form
"nvme://[PCI address]/[namespace]" (as the comment above
nvme_parse_filename() describes).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-27-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:27 +01:00
Max Reitz 998b3a1e5a block: Purify .bdrv_refresh_filename()
Currently, BlockDriver.bdrv_refresh_filename() is supposed to both
refresh the filename (BDS.exact_filename) and set BDS.full_open_options.
Now that we have generic code in the central bdrv_refresh_filename() for
creating BDS.full_open_options, we can drop the latter part from all
BlockDriver.bdrv_refresh_filename() implementations.

This also means that we can drop all of the existing default code for
this from the global bdrv_refresh_filename() itself.

Furthermore, we now have to call BlockDriver.bdrv_refresh_filename()
after having set BDS.full_open_options, because the block driver's
implementation should now be allowed to depend on BDS.full_open_options
being set correctly.

Finally, with this patch we can drop the @options parameter from
BlockDriver.bdrv_refresh_filename(); also, add a comment on this
function's purpose in block/block_int.h while touching its interface.

This completely obsoletes blklogwrite's implementation of
.bdrv_refresh_filename().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190201192935.18394-25-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:27 +01:00
Max Reitz abc521a9aa block: Add BlockDriver.bdrv_gather_child_options
Some follow-up patches will rework the way bs->full_open_options is
refreshed in bdrv_refresh_filename(). The new implementation will remove
the need for the block drivers' bdrv_refresh_filename() implementations
to set bs->full_open_options; instead, it will be generic and use static
information from each block driver.

However, by implementing bdrv_gather_child_options(), block drivers will
still be able to override the way the full_open_options of their
children are incorporated into their own.

We need to implement this function for VMDK because we have to prevent
the generic implementation from gathering the options of all children:
It is not possible to specify options for the extents through the
runtime options.

For quorum, the child names that would be used by the generic
implementation and the ones that we actually (currently) want to use
differ. See quorum_gather_child_options() for more information.

Note that both of these are cases which are not ideal: In case of VMDK
it would probably be nice to be able to specify options for all extents.
In case of quorum, the current runtime option structure is simply broken
and needs to be fixed (but that is left for another patch).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-23-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:27 +01:00
Max Reitz 2654267cc1 block: Add strong_runtime_opts to BlockDriver
This new field can be set by block drivers to list the runtime options
they accept that may influence the contents of the respective BDS. As of
a follow-up patch, this list will be used by the common
bdrv_refresh_filename() implementation to decide which options to put
into BDS.full_open_options (and consequently whether a JSON filename has
to be created), thus freeing the drivers of having to implement that
logic themselves.

Additionally, this patch adds the field to all of the block drivers that
need it and sets it accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-22-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:27 +01:00
Max Reitz 0dcbc54a95 block/nfs: Implement bdrv_dirname()
While the basic idea is obvious and could be handled by the default
bdrv_dirname() implementation, we cannot generate a directory name if
the gid or uid are set, so we have to explicitly return NULL in those
cases.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-19-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:26 +01:00
Max Reitz 8a6239c071 block/nbd: Make bdrv_dirname() return NULL
The generic bdrv_dirname() implementation would be able to generate some
form of directory name for many NBD nodes, but it would be always wrong.
Therefore, we have to explicitly make it an error (until NBD has some
form of specification for export paths, if it ever will).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20190201192935.18394-18-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:26 +01:00
Max Reitz f3037bd254 quorum: Make bdrv_dirname() return NULL
While the common implementation for bdrv_dirname() should return NULL
for quorum BDSs already (because they do not have a file node and their
exact_filename field should be empty), there is no reason not to make
that explicit.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-17-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:26 +01:00
Max Reitz 27953572a5 blkverify: Make bdrv_dirname() return NULL
blkverify's BDSs have a file BDS, but we do not want this to be
preferred over the raw node. There is no way to decide between the two
(and not really a reason to, either), so just return NULL in blkverify's
implementation of bdrv_dirname().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-16-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:26 +01:00
Max Reitz 6b6833c1b4 block: bdrv_get_full_backing_filename's ret. val.
Make bdrv_get_full_backing_filename() return an allocated string instead
of placing the result in a caller-provided buffer.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-12-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:26 +01:00
Max Reitz 645ae7d88e block: bdrv_get_full_backing_filename_from_...'s ret. val.
Make bdrv_get_full_backing_filename_from_filename() return an allocated
string instead of placing the result in a caller-provided buffer.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190201192935.18394-11-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:26 +01:00
Max Reitz 009b03aaa2 block: Make path_combine() return the path
Besides being safe for arbitrary path lengths, after some follow-up
patches all callers will want a freshly allocated buffer anyway.

In the meantime, path_combine_deprecated() is added which has the same
interface as path_combine() had before this patch. All callers to that
function will be converted in follow-up patches.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20190201192935.18394-10-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:26 +01:00
Max Reitz 998c201923 block: Add BDS.auto_backing_file
If the backing file is overridden, this most probably does change the
guest-visible data of a BDS.  Therefore, we will need to consider this
in bdrv_refresh_filename().

To see whether it has been overridden, we might want to compare
bs->backing_file and bs->backing->bs->filename.  However,
bs->backing_file is changed by bdrv_set_backing_hd() (which is just used
to change the backing child at runtime, without modifying the image
header), so bs->backing_file most of the time simply contains a copy of
bs->backing->bs->filename anyway, so it is useless for such a
comparison.

This patch adds an auto_backing_file BDS field which contains the
backing file path as indicated by the image header, which is not changed
by bdrv_set_backing_hd().

Because of bdrv_refresh_filename() magic, however, a BDS's filename may
differ from what has been specified during bdrv_open().  Then, the
comparison between bs->auto_backing_file and bs->backing->bs->filename
may fail even though bs->backing was opened from bs->auto_backing_file.
To mitigate this, we can copy the real BDS's filename (after the whole
bdrv_open() and bdrv_refresh_filename() process) into
bs->auto_backing_file, if we know the former has been opened based on
the latter.  This is only possible if no options modifying the backing
file's behavior have been specified, though.  To simplify things, this
patch only copies the filename from the backing file if no options have
been specified for it at all.

Furthermore, there are cases where an overlay is created by qemu which
already contains a BDS's filename (e.g. in blockdev-snapshot-sync).  We
do not need to worry about updating the overlay's bs->auto_backing_file
there, because we actually wrote a post-bdrv_refresh_filename() filename
into the image header.

So all in all, there will be false negatives where (as of a future
patch) bdrv_refresh_filename() will assume that the backing file differs
from what was specified in the image header, even though it really does
not.  However, these cases should be limited to where (1) the user
actually did override something in the backing chain (e.g. by specifying
options for the backing file), or (2) the user executed a QMP command to
change some node's backing file (e.g. change-backing-file or
block-commit with @backing-file given) where the given filename does not
happen to coincide with qemu's idea of the backing BDS's filename.

Then again, (1) really is limited to -drive.  With -blockdev or
blockdev-add, you have to adhere to the schema, so a user cannot give
partial "unimportant" options (e.g. by just setting backing.node-name
and leaving the rest to the image header).  Therefore, trying to fix
this would mean trying to fix something for -drive only.

To improve on (2), we would need a full infrastructure to "canonicalize"
an arbitrary filename (+ options), so it can be compared against
another.  That seems a bit over the top, considering that filenames
nowadays are there mostly for the user's entertainment.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-5-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:25 +01:00
Max Reitz e24518e303 block: Use children list in bdrv_refresh_filename
bdrv_refresh_filename() should invoke itself recursively on all
children, not just on file.

With that change, we can remove the manual invocations in blkverify,
quorum, commit, mirror, and blklogwrites.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-3-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:25 +01:00
Max Reitz f30c66ba6e block: Use bdrv_refresh_filename() to pull
Before this patch, bdrv_refresh_filename() is used in a pushing manner:
Whenever the BDS graph is modified, the parents of the modified edges
are supposed to be updated (recursively upwards).  However, that is
nonviable, considering that we want child changes not to concern
parents.

Also, in the long run we want a pull model anyway: Here, we would have a
bdrv_filename() function which returns a BDS's filename, freshly
constructed.

This patch is an intermediate step.  It adds bdrv_refresh_filename()
calls before every place a BDS.filename value is used.  The only
exceptions are protocol drivers that use their own filename, which
clearly would not profit from refreshing that filename before.

Also, bdrv_get_encrypted_filename() is removed along the way (as a user
of BDS.filename), since it is completely unused.

In turn, all of the calls to bdrv_refresh_filename() before this patch
are removed, because we no longer have to call this function on graph
changes.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190201192935.18394-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25 15:11:25 +01:00
Thomas Huth 83c68e149a block/nvme: Remove QEMU_PACKED from naturally aligned NVMeRegs struct
The QEMU_PACKED is causing a compiler warning/error with GCC 9:

  CC      block/nvme.o
block/nvme.c: In function ‘nvme_create_queue_pair’:
block/nvme.c:209:22: error: taking address of packed member of
 ‘struct <anonymous>’ may result in an unaligned pointer value
 [-Werror=address-of-packed-member]
  209 |     q->sq.doorbell = &s->regs->doorbells[idx * 2 * s->doorbell_scale];

All members of the struct are naturally aligned, so there should
not be the need for QEMU_PACKED here, and the following QEMU_BUILD_BUG_ON
also ensures that there is no padding. Thus simply remove the QEMU_PACKED
here.

Buglink: https://bugs.launchpad.net/qemu/+bug/1817525
Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25 15:09:48 +01:00
Alberto Garcia c1c4399084 qcow2: Assert that L2 table offsets fit in the L1 table
L1 table entries have a field to store the offset of an L2 table.
The rest of the bits of the entry are currently reserved except from
bit 63, which stores the COPIED flag.

The offset is always taken from the entry using L1E_OFFSET_MASK to
ensure that we only use the bits that belong to that field.

While that mask is used every time we read from the L1 table, it is
never used when we write to it. Due to the limits set elsewhere in the
code QEMU can never produce L2 table offsets that don't fit in that
field so any such offset when allocating an L2 table would indicate a
bug in QEMU.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25 15:05:23 +01:00
Kevin Wolf 28e0b2d2e1 nbd: Increase bs->in_flight during AioContext switch
bdrv_drain() must not leave connection_co scheduled, so bs->in_flight
needs to be increased while the coroutine is waiting to be scheduled
in the new AioContext after nbd_client_attach_aio_context().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25 15:03:19 +01:00
Kevin Wolf d3bd5b9089 nbd: Use low-level QIOChannel API in nbd_read_eof()
Instead of using the convenience wrapper qio_channel_read_all_eof(), use
the lower level QIOChannel API. This means duplicating some code, but
we'll need this because this coroutine yield is special: We want it to
be interruptible so that nbd_client_attach_aio_context() can correctly
reenter the coroutine.

This moves the bdrv_dec/inc_in_flight() pair into nbd_read_eof(), so
that connection_co will always sit in this exact qio_channel_yield()
call when bdrv_drain() returns.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2019-02-25 15:03:19 +01:00
Kevin Wolf 5ad81b4946 nbd: Restrict connection_co reentrance
nbd_client_attach_aio_context() schedules connection_co in the new
AioContext and this way reenters it in any arbitrary place that has
yielded. We can restrict this a bit to the function call where the
coroutine actually sits waiting when it's idle.

This doesn't solve any bug yet, but it shows where in the code we need
to support this random reentrance and where we don't have to care.

Add FIXME comments for the existing bugs that the rest of this series
will fix.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2019-02-25 15:03:19 +01:00
Kevin Wolf c90e2a9cfd block-backend: Make blk_inc/dec_in_flight public
For some users of BlockBackends, just increasing the in_flight counter
is easier than implementing separate handlers in BlockDevOps. Make the
helper functions for this public.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25 15:03:19 +01:00
Alberto Garcia 2468eed3be commit: Replace commit_top_bs on failure after deleting the block job
If there's an error in commit_start() then the block job must be
deleted before replacing commit_top_bs, otherwise it will fail because
of lack of permissions. This happens since the permission system was
introduced in 8dfba27977.

Fortunately this bug doesn't seem to be possible to reproduce at the
moment without changing the code.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25 15:03:19 +01:00
Daniel Henrique Barboza 161e612d20 qcow2-snapshot: remove redundant find_snapshot_by_id_and_name call
In qcow2_snapshot_create there is the following code block:

    /* Generate an ID */
    find_new_snapshot_id(bs, sn_info->id_str, sizeof(sn_info->id_str));

    /* Check that the ID is unique */
    if (find_snapshot_by_id_and_name(bs, sn_info->id_str, NULL) >= 0) {
        return -EEXIST;
    }

find_new_snapshot_id cycles through all snapshots, getting the id_str
as an unsigned long int, calculating the max id_max value of all the
existing id_strs and writing in the id_str pointer id_max + 1:

    for(i = 0; i < s->nb_snapshots; i++) {
        sn = s->snapshots + i;
        id = strtoul(sn->id_str, NULL, 10);
        if (id > id_max)
            id_max = id;
    }
    snprintf(id_str, id_str_size, "%lu", id_max + 1);

Here, sn_info->id_str will have the unique value id_max + 1. Right
after that, find_snapshot_by_id_and_name is called with
id = sn_info->id_str and name = NULL. This will cause the function
to execute the following:

    } else if (id) {
        for (i = 0; i < s->nb_snapshots; i++) {
            if (!strcmp(s->snapshots[i].id_str, id)) {
                return i;
            }
        }
    }

In short, we're searching the existing snapshots to see if sn_info->id_str
matches any existing id, right after we set in the previous line a
sn_info->id_str value that is already unique.

The first code block goes way back to commit 585f8587ad, a 2006 commit from
Fabrice Bellard that simply says "new qcow2 disk image format". No more
info is provided about this logic in any subsequent commits that moved
this code block around.

I can't say about the original design, but the current logic is redundant.
bdrv_snapshot_create is called in aio_context lock, forbidding any
concurrent call to accidentally create a new snapshot between
the find_new_snapshot_id and find_snapshot_by_id_and_name calls. What
we're ending up doing is to cycle through the snapshots two times
for no viable reason.

This patch eliminates the redundancy by removing the 'id is unique'
check that calls find_snapshot_by_id_and_name.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25 15:03:19 +01:00
Daniel Henrique Barboza 8c04093c8c block/snapshot: remove bdrv_snapshot_delete_by_id_or_name
After the previous patch, the only instance of this function left
is inside qemu-img.c.

qemu-img is using it inside the 'img_snapshot' function to delete
snapshots in the SNAPSHOT_DELETE case, based on a "snapshot_name"
string that refers to the tag, not ID, of the QEMUSnapshotInfo struct.
This can be verified by checking the SNAPSHOT_CREATE case that
comes shortly before SNAPSHOT_DELETE. In that case, the same
"snapshot_name" variable is being strcpy to the 'name' field
of the QEMUSnapshotInfo struct sn:

pstrcpy(sn.name, sizeof(sn.name), snapshot_name);

Based on that, it is unlikely that "snapshot_name" might contain
an "id" in SNAPSHOT_DELETE.

This patch changes SNAPSHOT_DELETE to use snapshot_find() and
snapshot_delete() instead of bdrv_snapshot_delete_by_id_or_name.
After that, there is no instances left of bdrv_snapshot_delete_by_id_or_name
in the code, so it is safe to remove it entirely.

Suggested-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25 15:03:18 +01:00
Daniel Henrique Barboza 6ca080453e block/snapshot.c: eliminate use of ID input in snapshot operations
At this moment, QEMU attempts to create/load/delete snapshots
by using either an ID (id_str) or a name. The problem is that the code
isn't consistent of whether the entered argument is an ID or a name,
causing unexpected behaviors.

For example, when creating snapshots via savevm <arg>, what happens is that
"arg" is treated as both name and id_str. In a guest without snapshots, create
a single snapshot via savevm:

(qemu) savevm 0
(qemu) info snapshots
List of snapshots present on all disks:
ID        TAG                 VM SIZE                DATE       VM CLOCK
--        0                      741M 2018-07-31 13:39:56   00:41:25.313

A snapshot with name "0" is created. ID is hidden from the user, but the
ID is a non-zero integer that starts at "1". Thus, this snapshot has
id_str=1, TAG="0". Creating a second snapshot with arg = 1, the first one
is deleted:

(qemu) savevm 1
(qemu) info snapshots
List of snapshots present on all disks:
ID        TAG                 VM SIZE                DATE       VM CLOCK
--        1                      741M 2018-07-31 13:42:14   00:41:55.252

What happened?

- when creating the second snapshot, a verification is done inside
bdrv_all_delete_snapshot to delete any existing snapshots that matches an
string argument. Here, the code calls bdrv_all_delete_snapshot("1", ...);

- bdrv_all_delete_snapshot calls bdrv_snapshot_find(..., "1") for each
BlockDriverState of the guest. And this is where things goes tilting:
bdrv_snapshot_find does a search by both id_str and name. It finds
out that there is a snapshot that has id_str = 1, stores a reference
to the snapshot in the sn_info pointer and then returns match found;

- since a match was found, a call to bdrv_snapshot_delete_by_id_or_name() is
made. This function ignores the pointer written by bdrv_snapshot_find. Instead,
it deletes the snapshot using bdrv_snapshot_delete() calling it first with
id_str = 1. If it fails to delete, then it calls it again with name = 1.

- after all that, QEMU creates the new snapshot, that has id_str = 1 and
name = 1. The user is left wondering that happened with the first snapshot
created. Similar bugs can be triggered when using loadvm and delvm.

Before contemplating discarding the use of ID input in these operations,
I've searched the code of what would be the implications. My findings
are:

- the RBD and Sheepdog drivers don't care. Both uses the 'name' field as
key in their logic, making id_str = name when appropriate.
replay-snapshot.c does not make any special use of id_str;

- qcow2 uses id_str as an unique identifier but it is automatically
calculated, not being influenced by user input. Other than that, there are
no distinguish operations made only with id_str;

- in blockdev.c, the delete operation uses a match of both id_str AND
name. Given that id_str is either a copy of 'name' or auto-generated,
we're fine here.

This gives motivation to not consider ID as a valid user input in HMP
commands - sticking with 'name' input only is more consistent. To
accomplish that, the following changes were made in this patch:

- bdrv_snapshot_find() does not match for id_str anymore, only 'name'. The
function is called in save_snapshot(), load_snapshot(), bdrv_all_delete_snapshot()
and bdrv_all_find_snapshot(). This change makes the search function more
predictable and does not change the behavior of any underlying code that uses
these affected functions, which are related to HMP (which is fine) and the
main loop inside vl.c (which doesn't care about it anyways);

- bdrv_all_delete_snapshot() does not call bdrv_snapshot_delete_by_id_or_name
anymore. Instead, it uses the pointer returned by bdrv_snapshot_find to
erase the snapshot with the exact match of id_str an name. This function
is called in save_snapshot and hmp_delvm, thus this change  produces the
intended effect;

- documentation changes to reflect the new behavior. I consider this to
be an API fix instead of an API change - the user was already creating
snapshots using 'name', but now he/she will also enjoy a consistent
behavior.

Ideally we would get rid of the id_str field entirely, but this would have
repercussions on existing snapshots. Another day perhaps.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25 15:03:18 +01:00
Vladimir Sementsov-Ogievskiy 199d95b043 block/vmdk: use qemu_iovec_init_buf
Use new qemu_iovec_init_buf() instead of
qemu_iovec_init_external( ... , 1), which simplifies the code.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190218140926.333779-12-vsementsov@virtuozzo.com
Message-Id: <20190218140926.333779-12-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22 09:42:13 +00:00
Vladimir Sementsov-Ogievskiy 342544f98b block/qed: use qemu_iovec_init_buf
Use new qemu_iovec_init_buf() instead of
qemu_iovec_init_external( ... , 1), which simplifies the code.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190218140926.333779-11-vsementsov@virtuozzo.com
Message-Id: <20190218140926.333779-11-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22 09:42:13 +00:00
Vladimir Sementsov-Ogievskiy c793d4ff20 block/qcow2: use qemu_iovec_init_buf
Use new qemu_iovec_init_buf() instead of
qemu_iovec_init_external( ... , 1), which simplifies the code.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190218140926.333779-10-vsementsov@virtuozzo.com
Message-Id: <20190218140926.333779-10-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22 09:42:13 +00:00
Vladimir Sementsov-Ogievskiy 30d780f8fe block/qcow: use qemu_iovec_init_buf
Use new qemu_iovec_init_buf() instead of
qemu_iovec_init_external( ... , 1), which simplifies the code.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190218140926.333779-9-vsementsov@virtuozzo.com
Message-Id: <20190218140926.333779-9-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22 09:42:13 +00:00
Vladimir Sementsov-Ogievskiy b2589d66b4 block/parallels: use QEMU_IOVEC_INIT_BUF
Use new QEMU_IOVEC_INIT_BUF() instead of
qemu_iovec_init_external( ... , 1), which simplifies the code.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190218140926.333779-8-vsementsov@virtuozzo.com
Message-Id: <20190218140926.333779-8-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22 09:42:13 +00:00
Vladimir Sementsov-Ogievskiy 8040446d30 block/stream: use QEMU_IOVEC_INIT_BUF
Use new QEMU_IOVEC_INIT_BUF() instead of
qemu_iovec_init_external( ... , 1), which simplifies the code.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190218140926.333779-7-vsementsov@virtuozzo.com
Message-Id: <20190218140926.333779-7-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22 09:42:13 +00:00
Vladimir Sementsov-Ogievskiy ee7a883ace block/commit: use QEMU_IOVEC_INIT_BUF
Use new QEMU_IOVEC_INIT_BUF() instead of
qemu_iovec_init_external( ... , 1), which simplifies the code.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190218140926.333779-6-vsementsov@virtuozzo.com
Message-Id: <20190218140926.333779-6-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22 09:42:13 +00:00
Vladimir Sementsov-Ogievskiy 31e0a6e1af block/backup: use qemu_iovec_init_buf
Use new qemu_iovec_init_buf() instead of
qemu_iovec_init_external( ... , 1), which simplifies the code.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190218140926.333779-5-vsementsov@virtuozzo.com
Message-Id: <20190218140926.333779-5-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22 09:42:13 +00:00
Vladimir Sementsov-Ogievskiy ae5a9592ce block/block-backend: use QEMU_IOVEC_INIT_BUF
Use new QEMU_IOVEC_INIT_BUF() instead of
qemu_iovec_init_external( ... , 1), which simplifies the code.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190218140926.333779-4-vsementsov@virtuozzo.com
Message-Id: <20190218140926.333779-4-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22 09:42:13 +00:00
Vladimir Sementsov-Ogievskiy 0d93ed0845 block/io: use qemu_iovec_init_buf
Use new qemu_iovec_init_buf() instead of
qemu_iovec_init_external( ... , 1), which simplifies the code.

While being here, use qemu_try_blockalign0 as well.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190218140926.333779-3-vsementsov@virtuozzo.com
Message-Id: <20190218140926.333779-3-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22 09:42:13 +00:00
John Snow 73ab5d601c block/dirty-bitmap: Documentation and Comment fixups
The meaning of the states has changed subtly over time,
this should bring the understanding more in-line with the
current, actual usages.

Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190202011048.12343-1-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-02-19 17:49:43 -05:00
Eric Blake f67cf661f8 dirty-bitmap: Expose persistent flag to 'query-block'
Since qemu currently doesn't flush persistent bitmaps to disk until
shutdown (which might be MUCH later), it's useful if 'query-block'
at least shows WHICH bitmaps will (eventually) make it to persistent
storage.  Update affected iotests.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190204210512.27458-1-eblake@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-02-19 17:49:43 -05:00
Andrey Shinkevich b8968c875f qcow2: Add list of bitmaps to ImageInfoSpecificQCow2
In the 'Format specific information' section of the 'qemu-img info'
command output, the supplemental information about existing QCOW2
bitmaps will be shown, such as a bitmap name, flags and granularity:

image: /vz/vmprivate/VM1/harddisk.hdd
file format: qcow2
virtual size: 64G (68719476736 bytes)
disk size: 3.0M
cluster_size: 1048576
Format specific information:
    compat: 1.1
    lazy refcounts: true
    bitmaps:
        [0]:
            flags:
                [0]: in-use
                [1]: auto
            name: back-up1
            granularity: 65536
        [1]:
            flags:
                [0]: in-use
                [1]: auto
            name: back-up2
            granularity: 65536
    refcount bits: 16
    corrupt: false

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <1549638368-530182-3-git-send-email-andrey.shinkevich@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-11 14:35:43 -06:00
Andrey Shinkevich 1bf6e9ca92 bdrv_query_image_info Error parameter added
Inform a user in case qcow2_get_specific_info fails to obtain
QCOW2 image specific information. This patch is preliminary to
the one "qcow2: Add list of bitmaps to ImageInfoSpecificQCow2".

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <1549638368-530182-2-git-send-email-andrey.shinkevich@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-11 14:35:43 -06:00
Vladimir Sementsov-Ogievskiy bc5a03350c block/nbd-client: rename read_reply_co to connection_co
This coroutine will serve nbd reconnects, so, rename it to be something
more generic.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190201130138.94525-7-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-04 15:11:27 -06:00
Vladimir Sementsov-Ogievskiy 88ed4e1bf0 block/nbd-client: don't check ioc
We have several paranoid checks for ioc != NULL. But ioc may become
NULL only on close, which should not happen during requests handling.
Also, we check ioc only sometimes, not after each yield, which is
inconsistent. Let's drop these checks. However, for safety, let's leave
asserts instead.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190201130138.94525-6-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-04 15:11:27 -06:00
Vladimir Sementsov-Ogievskiy 65e01d4765 block/nbd-client: fix nbd_reply_chunk_iter_receive
Use exported report, not the variable to be reused (should not really
matter).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190201130138.94525-5-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-04 15:11:27 -06:00
Vladimir Sementsov-Ogievskiy b0e4b5a58f block/nbd-client: split connection from initialization
Split connection code to reuse it for reconnect.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190201130138.94525-4-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-04 15:11:27 -06:00
Vladimir Sementsov-Ogievskiy d42f78e940 block/nbd: move connection code from block/nbd to block/nbd-client
Keep all connection code in one file, to be able to implement reconnect
in further patches.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190201130138.94525-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: format tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-04 15:11:27 -06:00
Vladimir Sementsov-Ogievskiy 7f86068dc1 block/nbd-client: split channel errors from export errors
To implement nbd reconnect in further patches, we need to distinguish
error codes, returned by nbd server, from channel errors, to reconnect
only in the latter case.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190201130138.94525-2-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-04 15:11:27 -06:00
Vladimir Sementsov-Ogievskiy e6798f06a6 nbd: generalize usage of nbd_read
We generally do very similar things around nbd_read: error_prepend
specifying what we have tried to read, and be_to_cpu conversion of
integers.

So, it seems reasonable to move common things to helper functions,
which:
1. simplify code a bit
2. generalize nbd_read error descriptions, all starting with
   "Failed to read"
3. make it more difficult to forget to convert things from BE

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190128165830.165170-1-vsementsov@virtuozzo.com>
[eblake: rename macro to DEF_NBD_READ_N and formatting tweaks;
checkpatch has false positive complaint]
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-04 15:11:27 -06:00
Markus Armbruster 14632122b8 block: Eliminate the S_1KiB, S_2KiB, ... macros
We define 54 macros for the powers of two >= 1024.  We use six, in six
macro definitions.  Four of them could just as well use the common MiB
macro, so do that.  The remaining two can't, because they get passed
to stringify.  Replace the macro by the literal number there.
Slightly harder to read in one instance (1048576 vs. S_1MiB), so add a
comment there.  The other instance is a wash: 65536 vs S_64KiB.  65536
has been good enough for more than seven years there.

This effectively reverts commit 540b849261 and 1240ac558d.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01 13:46:45 +01:00
Thomas Huth d09ea2d227 block: Remove blk_attach_dev_legacy() / legacy_dev code
The last user of blk_attach_dev_legacy() was the code in xen_disk which
has recently been reworked. Now there is no user for this legacy function
anymore. Thus we can finally remove all code related to the "legacy_dev"
flag, too, and turn the related "void *" in block-backend.c into proper
"DeviceState *" to fix some of the remaining TODOs there.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01 13:46:45 +01:00
Peter Maydell 1324f06384 uuid: Make qemu_uuid_bswap() take and return a QemuUUID
Currently qemu_uuid_bswap() takes a pointer to the QemuUUID to
be byte-swapped. This means it can't be used when the UUID
to be swapped is in a packed member of a struct. It's also
out of line with the general bswap*() functions we provide
in bswap.h, which take the value to be swapped and return it.

Make qemu_uuid_bswap() take a QemuUUID and return the swapped version.

This fixes some clang warnings about taking the address of
a packed struct member in block/vdi.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01 13:46:45 +01:00
Peter Maydell ac928b8ee8 block/vdi: Don't take address of fields in packed structs
Taking the address of a field in a packed struct is a bad idea, because
it might not be actually aligned enough for that pointer type (and
thus cause a crash on dereference on some host architectures). Newer
versions of clang warn about this.

Instead of passing UUID related functions the address of a possibly
unaligned QemuUUID struct, use local variables and then copy to/from
the struct field as appropriate.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01 13:46:44 +01:00
Peter Maydell 0dbaaa7981 block/vpc: Don't take address of fields in packed structs
Taking the address of a field in a packed struct is a bad idea, because
it might not be actually aligned enough for that pointer type (and
thus cause a crash on dereference on some host architectures). Newer
versions of clang warn about this. Avoid the bug by generating the
UUID into a local variable which is definitely safely aligned and
then copying it into place.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01 13:46:44 +01:00
Kevin Wolf 4a960ece17 vmdk: Reject excess extents in blockdev-create
Clarify that the number of extents provided in BlockdevCreateOptionsVmdk
must match the number of extents that will actually be used. Providing
more extents will result in an error now.

This requires adapting the test case to provide the right number of
extents.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
2019-02-01 13:46:44 +01:00
Fam Zheng 3015372dd0 vmdk: Implement .bdrv_co_create callback
This makes VMDK support blockdev-create. The implementation reuses the
image creation code in vmdk_co_create_opts which now acceptes a callback
pointer to "retrieve" BlockBackend pointers from the caller. This way we
separate the logic between file/extent acquisition and initialization.

The QAPI command parameters are mostly the same as the old create_opts
except the dropped legacy @compat6 switch, which is redundant with
@hwversion.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01 13:46:44 +01:00
Fam Zheng 5be28490ca vmdk: Refactor vmdk_create_extent
The extracted vmdk_init_extent takes a BlockBackend object and
initializes the format metadata. It is the common part between "qemu-img
create" and "blockdev-create".

Add a "BlockBackend *pbb" parameter to vmdk_create_extent, to return the
opened BB to the caller in the next patch.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01 13:46:44 +01:00
Kevin Wolf 4720cbeea1 block: Fix hangs in synchronous APIs with iothreads
In the block layer, synchronous APIs are often implemented by creating a
coroutine that calls the asynchronous coroutine-based implementation and
then waiting for completion with BDRV_POLL_WHILE().

For this to work with iothreads (more specifically, when the synchronous
API is called in a thread that is not the home thread of the block
device, so that the coroutine will run in a different thread), we must
make sure to call aio_wait_kick() at the end of the operation. Many
places are missing this, so that BDRV_POLL_WHILE() keeps hanging even if
the condition has long become false.

Note that bdrv_dec_in_flight() involves an aio_wait_kick() call. This
corresponds to the BDRV_POLL_WHILE() in the drain functions, but it is
generally not enough for most other operations because they haven't set
the return value in the coroutine entry stub yet. To avoid race
conditions there, we need to kick after setting the return value.

The race window is small enough that the problem doesn't usually surface
in the common path. However, it does surface and causes easily
reproducible hangs if the operation can return early before even calling
bdrv_inc/dec_in_flight, which many of them do (trivial error or no-op
success paths).

The bug in bdrv_truncate(), bdrv_check() and bdrv_invalidate_cache() is
slightly different: These functions even neglected to schedule the
coroutine in the home thread of the node. This avoids the hang, but is
obviously wrong, too. Fix those to schedule the coroutine in the right
AioContext in addition to adding aio_wait_kick() calls.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-01 13:46:44 +01:00
Markus Armbruster 4e20c1becb block: Replace qdict_put() by qdict_put_obj() where appropriate
Patch created mechanically by rerunning:

  $  spatch --sp-file scripts/coccinelle/qobject.cocci \
	    --macro-file scripts/cocci-macro-file.h \
	    --dir block --in-place

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01 13:46:44 +01:00
Alberto Garcia cdc674c736 qcow2: Assert that refcount block offsets fit in the refcount table
Refcount table entries have a field to store the offset of the
refcount block. The rest of the bits of the entry are currently
reserved.

The offset is always taken from the entry using REFT_OFFSET_MASK to
ensure that we only use the bits that belong to that field.

While that mask is used every time we read from the refcount table, it
is never used when we write to it. Due to the other constraints of the
qcow2 format QEMU can never produce refcount block offsets that don't
fit in that field so any such offset when allocating a refcount block
would indicate a bug in QEMU.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01 13:46:44 +01:00
Alberto Garcia 67b24427fe mirror: Block the source BlockDriverState in mirror_start_job()
The mirror_start_job() function used for the commit-active job blocks
the source, target and all intermediate nodes for the duration of the
job.

   target <- intermediate <- source

Since 4ef85a9c23 this function creates a dummy mirror_top_bs that
goes on top of the source node, and it is this dummy node that gets
blocked instead. The source node is never blocked or added to the
job's list of nodes.

   target <- intermediate <- source <- mirror_top

At the moment I don't think it is possible to exploit this problem
because any additional job on 'source' would either be forbidden for
other reasons or it would need to involve an additional node that is
blocked, causing an error.

This can be seen in the error messages, however, because they never
refer to the source node being blocked:

  $ qemu-img create -f qcow2 hd0.qcow2 1M
  $ qemu-img create -f qcow2 -b hd0.qcow2 hd1.qcow2
  $ qemu-io -c 'write 0 1M' hd0.qcow2
  $ $QEMU -drive if=none,file=hd1.qcow2,node-name=hd1
  { "execute": "qmp_capabilities" }
  { "execute": "block-commit", "arguments": {"device": "hd1", "speed": 256}}
  { "execute": "block-stream", "arguments": {"device": "hd1"}}
  { "error": {"class": "GenericError",
    "desc": "Node 'hd0' is busy: block device is in use by block job: commit"}}

After this patch the error message refers to 'hd1', as it should.

The expected output of iotest 141 also needs to be updated for the
same reason.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01 13:46:44 +01:00
Alberto Garcia e917e2cb2a mirror: Release the dirty bitmap if mirror_start_job() fails
At the moment I don't see how to make this function fail after the
dirty bitmap has been created, but if that was possible then we would
hit the assert(QLIST_EMPTY(&bs->dirty_bitmaps)) in bdrv_close().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01 13:46:44 +01:00
Laurent Vivier 70018a149c block/sheepdog: Convert from DPRINTF() macro to trace events
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20181213162727.17438-5-lvivier@redhat.com
[mreitz: Fixed sheepdog_snapshot_create_inode's format string to use
         PRIx32 for uint32_ts]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-01-31 00:38:19 +01:00
Laurent Vivier 4f7d28d738 block/file-posix: Convert from DPRINTF() macro to trace events
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20181213162727.17438-4-lvivier@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-01-31 00:38:19 +01:00
Laurent Vivier ed2a66deb9 block/curl: Convert from DPRINTF() macro to trace events
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20181213162727.17438-3-lvivier@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-01-31 00:38:19 +01:00
Laurent Vivier 023908a24d block/ssh: Convert from DPRINTF() macro to trace events
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20181213162727.17438-2-lvivier@redhat.com
[mreitz: Fixed type of ssh_{read,write}_return's parameter to be ssize_t
         instead of size_t]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-01-31 00:38:19 +01:00
Vladimir Sementsov-Ogievskiy 5d3b4e9946 qapi: add x-debug-query-block-graph
Add a new command, returning block nodes (and their users) graph.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20181221170909.25584-2-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-01-31 00:38:19 +01:00
Stefan Hajnoczi bc19a0a6e4 throttle-groups: fix restart coroutine iothread race
The following QMP command leads to a crash when iothreads are used:

  { 'execute': 'device_del', 'arguments': {'id': 'data'} }

The backtrace involves the queue restart coroutine where
tgm->throttle_state is a NULL pointer because
throttle_group_unregister_tgm() has already been called:

  (gdb) bt full
  #0  0x00005585a7a3b378 in qemu_mutex_lock_impl (mutex=0xffffffffffffffd0, file=0x5585a7bb3d54 "block/throttle-groups.c", line=412) at util/qemu-thread-posix.c:64
        err = <optimized out>
        __PRETTY_FUNCTION__ = "qemu_mutex_lock_impl"
        __func__ = "qemu_mutex_lock_impl"
  #1  0x00005585a79be074 in throttle_group_restart_queue_entry (opaque=0x5585a9de4eb0) at block/throttle-groups.c:412
        _f = <optimized out>
        data = 0x5585a9de4eb0
        tgm = 0x5585a9079440
        ts = 0x0
        tg = 0xffffffffffffff98
        is_write = false
        empty_queue = 255

This coroutine should not execute in the iothread after the throttle
group member has been unregistered!

The root cause is that the device_del code path schedules the restart
coroutine in the iothread while holding the AioContext lock.  Therefore
the iothread cannot execute the coroutine until after device_del
releases the lock - by this time it's too late.

This patch adds a reference count to ThrottleGroupMember so we can
synchronously wait for restart coroutines to complete.  Once they are
done it is safe to unregister the ThrottleGroupMember.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190114133257.30299-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-01-24 10:02:28 +00:00
Peter Maydell 952bc8b3c2 nbd patches for 2019-01-21
Add 'qemu-nbd --list' for probing a remote NBD server's advertisements.
 
 - Eric Blake: 0/21 nbd: add qemu-nbd --list
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJcRktLAAoJEKeha0olJ0NqY8cIAJqcHDAuk1GwcDE/NIAASY9E
 2PsaEL2pM65bEWUwsnVkFyRb4y4S4QX9VtAR/c9G6jR0YHuDvCCz7VGjiHh3eGso
 9AWKzuh5C4mayAN5dTOlbjryOx7hlcz8wDDjP5OQNqIlBvmWVAnLeh5Kkqlc7KDk
 f1SH8kJfqegDAhKDeDhi/HGlL1UVzMr5A2sR6I1wbb5IbBru5JtAKzgqAmdBH893
 lWZ9EDXqajwPCo8ASwZNyawmtmjx+VBeFjO2juA3qTWvf262roiB3XT5W/3n/CCb
 Er5CBwQQvSFBUfF9bRYDziyjwQwTXoIQJ6nuJLzPGx2cUpQbJ/svphoRQ6rBOjY=
 =gywW
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2019-01-21' into staging

nbd patches for 2019-01-21

Add 'qemu-nbd --list' for probing a remote NBD server's advertisements.

- Eric Blake: 0/21 nbd: add qemu-nbd --list

# gpg: Signature made Mon 21 Jan 2019 22:44:27 GMT
# gpg:                using RSA key A7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>"
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>"
# gpg:                 aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2019-01-21: (21 commits)
  iotests: Enhance 223, 233 to cover 'qemu-nbd --list'
  nbd/client: Work around 3.0 bug for listing meta contexts
  qemu-nbd: Add --list option
  nbd/client: Add meta contexts to nbd_receive_export_list()
  nbd/client: Add nbd_receive_export_list()
  nbd/client: Refactor nbd_opt_go() to support NBD_OPT_INFO
  nbd/client: Pull out oldstyle size determination
  nbd/client: Split handshake into two functions
  nbd/client: Refactor return of nbd_receive_negotiate()
  nbd/client: Split out nbd_receive_one_meta_context()
  nbd/client: Split out nbd_send_meta_query()
  nbd/client: Change signature of nbd_negotiate_simple_meta_context()
  nbd/client: Move export name into NBDExportInfo
  nbd/client: Refactor nbd_receive_list()
  qemu-nbd: Avoid strtol open-coding
  nbd/server: Favor [u]int64_t over off_t
  nbd/server: Hoist length check to qmp_nbd_server_add
  qemu-nbd: Sanity check partition bounds
  qemu-nbd: Enhance man page
  maint: Allow for EXAMPLES in texi2pod
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-01-22 17:56:21 +00:00
Thomas Huth df71ca84e4 block: Work-around a bug in libiscsi 1.9.0 when used in gnu99 mode
The header "scsi-lowlevel.h" of libiscsi 1.9.0 contains some bad
"inline" prototype definitions which GCC refuses to compile in its
gnu99 mode:

In file included from block/iscsi.c:52:0:
/usr/include/iscsi/scsi-lowlevel.h:810:13: error: inline function
‘scsi_set_uint16’ declared but never defined [-Werror]
 inline void scsi_set_uint16(unsigned char *c, uint16_t val);
             ^
/usr/include/iscsi/scsi-lowlevel.h:809:13: error: inline function
‘scsi_set_uint32’ declared but never defined [-Werror]
 inline void scsi_set_uint32(unsigned char *c, uint32_t val);
             ^
[...]

This has been fixed by upstream libiscsi in version 1.10.0 (see
https://github.com/sahlberg/libiscsi/commit/7692027d6c11 ), but
since we still want to support 1.9.0 for CentOS 7 / RHEL7, we
have to work-around the issue by redefining the "inline" keyword
to use the old "gnu89" mode behavior via "gnu_inline" instead.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2019-01-22 06:26:32 +01:00
Eric Blake 2df94eb52b nbd/client: Change signature of nbd_negotiate_simple_meta_context()
Pass 'info' instead of three separate parameters related to info,
when requesting the server to set the meta context.  Update the
NBDExportInfo struct to rename the received id field to match the
fact that we are currently overloading the field to match whatever
context the user supplied through the x-dirty-bitmap hack, as well
as adding a TODO comment to remind future patches about a desire
to request two contexts at once.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190117193658.16413-11-eblake@redhat.com>
2019-01-21 15:49:52 -06:00
Eric Blake 6dc1667d68 nbd/client: Move export name into NBDExportInfo
Refactor the 'name' parameter of nbd_receive_negotiate() from
being a separate parameter into being part of the in-out 'info'.
This also spills over to a simplification of nbd_opt_go().

The main driver for this refactoring is that an upcoming patch
would like to add support to qemu-nbd to list information about
all exports available on a server, where the name(s) will be
provided by the server instead of the client.  But another benefit
is that we can now allow the client to explicitly specify the
empty export name "" even when connecting to an oldstyle server
(even if qemu is no longer such a server after commit 7f7dfe2a).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190117193658.16413-10-eblake@redhat.com>
2019-01-21 15:49:52 -06:00
Peter Maydell 51c1c13560 pci, pc, virtio: fixes, features
tpm physical presence interface
 rsc support in virtio net
 ivshmem is removed
 misc cleanups and fixes all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcQTW/AAoJECgfDbjSjVRpY34H/0wniUqht3CVVihAfC9qnL7N
 jgAJ41OGgrrCmXLrwa5P5iWxIKRjp2Odo7DQ2dl+9sqYshe6WoBBXmqEa8bDeldE
 69EJ3xSnZjkpScqXKoSaDO3dtibUGpdvlTKpjuh7q2FFBOvNvf3hQubhyt/vINsO
 qoDMqycjXJkHSaZLfdNb3P/LAkAuHI4veNk8KEbog3XTijSUbZlb2GYc8vT/agka
 0SldiEVi9CTIh7co0L2BkcLVeTMVrF3OZuWDek3tVHXPxPaNbQM3GMvHZaxqZusU
 SZinnNhcnI++kgLZVHuurxo7e1S9d8Oy0QuHGX8CZ1sfeqy3vzxer0Hbb7BdLIY=
 =GhIM
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pci, pc, virtio: fixes, features

tpm physical presence interface
rsc support in virtio net
ivshmem is removed
misc cleanups and fixes all over the place

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Fri 18 Jan 2019 02:11:11 GMT
# gpg:                using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream: (49 commits)
  migration: Use strnlen() for fixed-size string
  migration: Fix stringop-truncation warning
  hw/acpi: Use QEMU_NONSTRING for non NUL-terminated arrays
  block/sheepdog: Use QEMU_NONSTRING for non NUL-terminated arrays
  qemu/compiler: Define QEMU_NONSTRING
  acpi: update expected files
  hw: acpi: Fix memory hotplug AML generation error
  tpm: clear RAM when "memory overwrite" requested
  acpi: add ACPI memory clear interface
  acpi: build TPM Physical Presence interface
  acpi: expose TPM/PPI configuration parameters to firmware via fw_cfg
  tpm: allocate/map buffer for TPM Physical Presence interface
  tpm: add a "ppi" boolean property
  hw/misc/edu: add msi_uninit() for pci_edu_uninit()
  virtio: Make disable-legacy/disable-modern compat properties optional
  globals: Allow global properties to be optional
  virtio: virtio 9p really requires CONFIG_VIRTFS to work
  virtio: split virtio crypto bits from virtio-pci.h
  virtio: split virtio gpu bits from virtio-pci.h
  virtio: split virtio serial bits from virtio-pci
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-01-18 14:58:58 +00:00
Philippe Mathieu-Daudé 97b583f46c block/sheepdog: Use QEMU_NONSTRING for non NUL-terminated arrays
GCC 8 added a -Wstringop-truncation warning:

  The -Wstringop-truncation warning added in GCC 8.0 via r254630 for
  bug 81117 is specifically intended to highlight likely unintended
  uses of the strncpy function that truncate the terminating NUL
  character from the source string.

This new warning leads to compilation failures:

    CC      block/sheepdog.o
  qemu/block/sheepdog.c: In function 'find_vdi_name':
  qemu/block/sheepdog.c:1239:5: error: 'strncpy' specified bound 256 equals destination size [-Werror=stringop-truncation]
       strncpy(buf + SD_MAX_VDI_LEN, tag, SD_MAX_VDI_TAG_LEN);
       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  make: *** [qemu/rules.mak:69: block/sheepdog.o] Error 1

As described previous to the strncpy() calls, the use of strncpy() is
correct here:

    /* This pair of strncpy calls ensures that the buffer is zero-filled,
     * which is desirable since we'll soon be sending those bytes, and
     * don't want the send_req to read uninitialized data.
     */
    strncpy(buf, filename, SD_MAX_VDI_LEN);
    strncpy(buf + SD_MAX_VDI_LEN, tag, SD_MAX_VDI_TAG_LEN);

Use the QEMU_NONSTRING attribute, since this array is intended to store
character arrays that do not necessarily contain a terminating NUL.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-01-17 21:10:57 -05:00
Vladimir Sementsov-Ogievskiy 19c021e194 Revert "hbitmap: Add @advance param to hbitmap_iter_next()"
This reverts commit a33fbb4f8b.

The functionality is unused.

Note: in addition to automatic revert, drop second parameter in
hbitmap_iter_next() call from hbitmap_next_dirty_area() too.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2019-01-15 18:26:50 -05:00
Vladimir Sementsov-Ogievskiy 166cd55125 Revert "block/dirty-bitmap: Add bdrv_dirty_iter_next_area"
This reverts commit 72d10a9421.

The function is unused now.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2019-01-15 18:26:50 -05:00
Vladimir Sementsov-Ogievskiy 1eaf1b0fdf block/mirror: fix and improve do_sync_target_write
Use bdrv_dirty_bitmap_next_dirty_area() instead of
bdrv_dirty_iter_next_area(), because of the following problems of
bdrv_dirty_iter_next_area():

1. Using HBitmap iterators we should carefully handle unaligned offset,
as first call to hbitmap_iter_next() may return a value less than
original offset (actually, it will be original offset rounded down to
bitmap granularity). This handling is not done in
do_sync_target_write().

2. bdrv_dirty_iter_next_area() handles unaligned max_offset
incorrectly:

look at the code:
    if (max_offset == iter->bitmap->size) {
        /* If max_offset points to the image end, round it up by the
         * bitmap granularity */
        gran_max_offset = ROUND_UP(max_offset, granularity);
    } else {
        gran_max_offset = max_offset;
    }

    ret = hbitmap_iter_next(&iter->hbi, false);
    if (ret < 0 || ret + granularity > gran_max_offset) {
        return false;
    }

and assume that max_offset != iter->bitmap->size but still unaligned.
if 0 < ret < max_offset we found dirty area, but the function can
return false in this case (if ret + granularity > max_offset).

3. bdrv_dirty_iter_next_area() uses inefficient loop to find the end of
the dirty area. Let's use more efficient hbitmap_next_zero instead
(bdrv_dirty_bitmap_next_dirty_area() do so)

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-01-15 18:26:50 -05:00
Vladimir Sementsov-Ogievskiy a78a1a48cd dirty-bitmap: add bdrv_dirty_bitmap_next_dirty_area
The function alters bdrv_dirty_iter_next_area(), which is wrong and
less efficient (see further commit
"block/mirror: fix and improve do_sync_target_write" for description).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-01-15 18:26:50 -05:00
Vladimir Sementsov-Ogievskiy 76d570dc49 dirty-bitmap: improve bdrv_dirty_bitmap_next_zero
Add bytes parameter to the function, to limit searched range.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-01-15 18:26:49 -05:00
John Snow 07d5a8df6a block/dirty-bitmap: remove assertion from restore
When making a backup of a dirty bitmap (for transactions), we want to
restore that backup whether or not the bitmap is enabled.

It is perfectly valid to write into bitmaps that are disabled. It is
only illegitimate for the guest to have done so.

Remove this assertion.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20181221093529.23855-3-jsnow@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-01-14 10:09:46 -06:00
Paolo Bonzini 7d37435bd5 avoid TABs in files that only contain a few
Most files that have TABs only contain a handful of them.  Change
them to spaces so that we don't confuse people.

disas, standard-headers, linux-headers and libdecnumber are imported
from other projects and probably should be exempted from the check.
Outside those, after this patch the following files still contain both
8-space and TAB sequences at the beginning of the line.  Many of them
have a majority of TABs, or were initially committed with all tabs.

    bsd-user/i386/target_syscall.h
    bsd-user/x86_64/target_syscall.h
    crypto/aes.c
    hw/audio/fmopl.c
    hw/audio/fmopl.h
    hw/block/tc58128.c
    hw/display/cirrus_vga.c
    hw/display/xenfb.c
    hw/dma/etraxfs_dma.c
    hw/intc/sh_intc.c
    hw/misc/mst_fpga.c
    hw/net/pcnet.c
    hw/sh4/sh7750.c
    hw/timer/m48t59.c
    hw/timer/sh_timer.c
    include/crypto/aes.h
    include/disas/bfd.h
    include/hw/sh4/sh.h
    libdecnumber/decNumber.c
    linux-headers/asm-generic/unistd.h
    linux-headers/linux/kvm.h
    linux-user/alpha/target_syscall.h
    linux-user/arm/nwfpe/double_cpdo.c
    linux-user/arm/nwfpe/fpa11_cpdt.c
    linux-user/arm/nwfpe/fpa11_cprt.c
    linux-user/arm/nwfpe/fpa11.h
    linux-user/flat.h
    linux-user/flatload.c
    linux-user/i386/target_syscall.h
    linux-user/ppc/target_syscall.h
    linux-user/sparc/target_syscall.h
    linux-user/syscall.c
    linux-user/syscall_defs.h
    linux-user/x86_64/target_syscall.h
    slirp/cksum.c
    slirp/if.c
    slirp/ip.h
    slirp/ip_icmp.c
    slirp/ip_icmp.h
    slirp/ip_input.c
    slirp/ip_output.c
    slirp/mbuf.c
    slirp/misc.c
    slirp/sbuf.c
    slirp/socket.c
    slirp/socket.h
    slirp/tcp_input.c
    slirp/tcpip.h
    slirp/tcp_output.c
    slirp/tcp_subr.c
    slirp/tcp_timer.c
    slirp/tftp.c
    slirp/udp.c
    slirp/udp.h
    target/cris/cpu.h
    target/cris/mmu.c
    target/cris/op_helper.c
    target/sh4/helper.c
    target/sh4/op_helper.c
    target/sh4/translate.c
    tcg/sparc/tcg-target.inc.c
    tests/tcg/cris/check_addo.c
    tests/tcg/cris/check_moveq.c
    tests/tcg/cris/check_swap.c
    tests/tcg/multiarch/test-mmap.c
    ui/vnc-enc-hextile-template.h
    ui/vnc-enc-zywrle.h
    util/envlist.c
    util/readline.c

The following have only TABs:

    bsd-user/i386/target_signal.h
    bsd-user/sparc64/target_signal.h
    bsd-user/sparc64/target_syscall.h
    bsd-user/sparc/target_signal.h
    bsd-user/sparc/target_syscall.h
    bsd-user/x86_64/target_signal.h
    crypto/desrfb.c
    hw/audio/intel-hda-defs.h
    hw/core/uboot_image.h
    hw/sh4/sh7750_regnames.c
    hw/sh4/sh7750_regs.h
    include/hw/cris/etraxfs_dma.h
    linux-user/alpha/termbits.h
    linux-user/arm/nwfpe/fpopcode.h
    linux-user/arm/nwfpe/fpsr.h
    linux-user/arm/syscall_nr.h
    linux-user/arm/target_signal.h
    linux-user/cris/target_signal.h
    linux-user/i386/target_signal.h
    linux-user/linux_loop.h
    linux-user/m68k/target_signal.h
    linux-user/microblaze/target_signal.h
    linux-user/mips64/target_signal.h
    linux-user/mips/target_signal.h
    linux-user/mips/target_syscall.h
    linux-user/mips/termbits.h
    linux-user/ppc/target_signal.h
    linux-user/sh4/target_signal.h
    linux-user/sh4/termbits.h
    linux-user/sparc64/target_syscall.h
    linux-user/sparc/target_signal.h
    linux-user/x86_64/target_signal.h
    linux-user/x86_64/termbits.h
    pc-bios/optionrom/optionrom.h
    slirp/mbuf.h
    slirp/misc.h
    slirp/sbuf.h
    slirp/tcp.h
    slirp/tcp_timer.h
    slirp/tcp_var.h
    target/i386/svm.h
    target/sparc/asi.h
    target/xtensa/core-dc232b/xtensa-modules.inc.c
    target/xtensa/core-dc233c/xtensa-modules.inc.c
    target/xtensa/core-de212/core-isa.h
    target/xtensa/core-de212/xtensa-modules.inc.c
    target/xtensa/core-fsf/xtensa-modules.inc.c
    target/xtensa/core-sample_controller/core-isa.h
    target/xtensa/core-sample_controller/xtensa-modules.inc.c
    target/xtensa/core-test_kc705_be/core-isa.h
    target/xtensa/core-test_kc705_be/xtensa-modules.inc.c
    tests/tcg/cris/check_abs.c
    tests/tcg/cris/check_addc.c
    tests/tcg/cris/check_addcm.c
    tests/tcg/cris/check_addoq.c
    tests/tcg/cris/check_bound.c
    tests/tcg/cris/check_ftag.c
    tests/tcg/cris/check_int64.c
    tests/tcg/cris/check_lz.c
    tests/tcg/cris/check_openpf5.c
    tests/tcg/cris/check_sigalrm.c
    tests/tcg/cris/crisutils.h
    tests/tcg/cris/sys.c
    tests/tcg/i386/test-i386-ssse3.c
    ui/vgafont.h

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20181213223737.11793-3-pbonzini@redhat.com>
Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Eric Blake <eblake@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Stefan Markovic <smarkovic@wavecomp.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 15:46:56 +01:00