Commit Graph

3419 Commits

Author SHA1 Message Date
Alberto Garcia 91203f08f0 qcow2: Allow 'cache-clean-interval' in Linux only
The cache-clean-interval option of qcow2 only works on Linux. However
we allow setting it in other systems regardless of whether it works or
not.

In those systems this option is not simply a no-op: it actually
invalidates perfectly valid cache tables for no good reason without
freeing their memory.

This patch forbids using that option in non-Linux systems.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-11-25 13:51:30 +01:00
Alberto Garcia 2f2c8d6b37 qcow2: Make qcow2_cache_table_release() work only in Linux
We are using QEMU_MADV_DONTNEED to discard the memory of individual L2
cache tables. The problem with this is that those semantics are
specific to the Linux madvise() system call. Other implementations of
madvise() (including the very Linux implementation of posix_madvise())
don't do that, so we cannot use them for the same purpose.

This patch makes the code Linux-specific and uses madvise() directly
since there's no point in going through qemu_madvise() for this.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-11-25 13:51:30 +01:00
Stefan Hajnoczi f0c10c392f Small fixes for rc1.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJYNMYwAAoJEL/70l94x66DBrUIAKeNK59lTbUm1WVl15nyB2qM
 jE2804Kcp+EGTwFHeo5GGsb+CplK54uMzHq2wzN6G3EmnaV3xbbdiZ7cmNl5Q6Tr
 qq7/pAer/T+xvQ3iDOTkAvJcqiMUZIx+MXrFED46KBUtqANJ2tAg2uEEqbI0RbOU
 +qtMZlPxo3IOuYnVROug1PPdNQDluBvZjrCYtb7VfZNo13u2UGYmRjZttobVfihF
 AQjv57uiawPs2e3VmUvIH8fjjEgV4MlPLiilL1eYsLaszjIBgdfrQOO7bdfetLo8
 THkNJEZTpS9T9ChcbcTKS7yovI3OiIxPMwyftELClacX3wVtSie2WNx0sj/3Xpw=
 =DPxR
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'bonzini/tags/for-upstream' into staging

Small fixes for rc1.

# gpg: Signature made Tue 22 Nov 2016 10:26:56 PM GMT
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* bonzini/tags/for-upstream:
  scsi/esp: do not raise an interrupt when reading the FIFO register
  nbd: Allow unmap and fua during write zeroes
  cpu_ldst.h: use correct guest address parameter

Message-id: 1479853676-35995-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-11-23 11:44:29 +00:00
Eric Blake 169407e1f7 nbd: Allow unmap and fua during write zeroes
Commit fa778fff wired up support to send the NBD_CMD_WRITE_ZEROES,
but forgot to inform the block layer that FUA unmapping of zeroes is
supported.  Without BDRV_REQ_MAY_UNMAP listed as a supported flag,
the block layer will always insist on the NBD layer passing
NBD_CMD_FLAG_NO_HOLE, resulting in the server always allocating
things even when it was desired to let the server punch holes.
Similarly, failing to set BDRV_REQ_FUA means that the client may
send unnecessary NBD_CMD_FLUSH when it could have instead used the
NBD_CMD_FLAG_FUA bit.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1479413642-22463-2-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-22 23:26:51 +01:00
Eric Blake 3482b9bc41 block: Pass unaligned discard requests to drivers
Discard is advisory, so rounding the requests to alignment
boundaries is never semantically wrong from the data that
the guest sees.  But at least the Dell Equallogic iSCSI SANs
has an interesting property that its advertised discard
alignment is 15M, yet documents that discarding a sequence
of 1M slices will eventually result in the 15M page being
marked as discarded, and it is possible to observe which
pages have been discarded.

Between commits 9f1963b and b8d0a980, we converted the block
layer to a byte-based interface that ultimately ignores any
unaligned head or tail based on the driver's advertised
discard granularity, which means that qemu 2.7 refuses to
pass any discard request smaller than 15M down to the Dell
Equallogic hardware.  This is a slight regression in behavior
compared to earlier qemu, where a guest executing discards
in power-of-2 chunks used to be able to get every page
discarded, but is now left with various pages still allocated
because the guest requests did not align with the hardware's
15M pages.

Since the SCSI specification says nothing about a minimum
discard granularity, and only documents the preferred
alignment, it is best if the block layer gives the driver
every bit of information about discard requests, rather than
rounding it to alignment boundaries early.

Rework the block layer discard algorithm to mirror the write
zero algorithm: always peel off any unaligned head or tail
and manage that in isolation, then do the bulk of the request
on an aligned boundary.  The fallback when the driver returns
-ENOTSUP for an unaligned request is to silently ignore that
portion of the discard request; but for devices that can pass
the partial request all the way down to hardware, this can
result in the hardware coalescing requests and discarding
aligned pages after all.

Reported by: Peter Lieven <pl@kamp.de>
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-11-22 15:59:23 +01:00
Eric Blake 49228d1e95 block: Return -ENOTSUP rather than assert on unaligned discards
Right now, the block layer rounds discard requests, so that
individual drivers are able to assert that discard requests
will never be unaligned.  But there are some ISCSI devices
that track and coalesce multiple unaligned requests, turning it
into an actual discard if the requests eventually cover an
entire page, which implies that it is better to always pass
discard requests as low down the stack as possible.

In isolation, this patch has no semantic effect, since the
block layer currently never passes an unaligned request through.
But the block layer already has code that silently ignores
drivers that return -ENOTSUP for a discard request that cannot
be honored (as well as drivers that return 0 even when nothing
was done).  But the next patch will update the block layer to
fragment discard requests, so that clients are guaranteed that
they are either dealing with an unaligned head or tail, or an
aligned core, making it similar to the block layer semantics of
write zero fragmentation.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-11-22 15:59:22 +01:00
Eric Blake b2f95feec5 block: Let write zeroes fallback work even with small max_transfer
Commit 443668ca rewrote the write_zeroes logic to guarantee that
an unaligned request never crosses a cluster boundary.  But
in the rewrite, the new code assumed that at most one iteration
would be needed to get to an alignment boundary.

However, it is easy to trigger an assertion failure: the Linux
kernel limits loopback devices to advertise a max_transfer of
only 64k.  Any operation that requires falling back to writes
rather than more efficient zeroing must obey max_transfer during
that fallback, which means an unaligned head may require multiple
iterations of the write fallbacks before reaching the aligned
boundaries, when layering a format with clusters larger than 64k
atop the protocol of file access to a loopback device.

Test case:

$ qemu-img create -f qcow2 -o cluster_size=1M file 10M
$ losetup /dev/loop2 /path/to/file
$ qemu-io -f qcow2 /dev/loop2
qemu-io> w 7m 1k
qemu-io> w -z 8003584 2093056

In fairness to Denis (as the original listed author of the culprit
commit), the faulty logic for at most one iteration is probably all
my fault in reworking his idea.  But the solution is to restore what
was in place prior to that commit: when dealing with an unaligned
head or tail, iterate as many times as necessary while fragmenting
the operation at max_transfer boundaries.

Reported-by: Ed Swierk <eswierk@skyportsystems.com>
CC: qemu-stable@nongnu.org
CC: Denis V. Lunev <den@openvz.org>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-11-22 15:59:22 +01:00
Eric Blake ecdbead659 qcow2: Inform block layer about discard boundaries
At the qcow2 layer, discard is only possible on a per-cluster
basis; at the moment, qcow2 silently rounds any unaligned
requests to this granularity.  However, an upcoming patch will
fix a regression in the block layer ignoring too much of an
unaligned discard request, by changing the block layer to
break up a discard request at alignment boundaries; for that
to work, the block layer must know about our limits.

However, we can't go one step further by changing
qcow2_discard_clusters() to assert that requests are always
aligned, since that helper function is reached on paths
outside of the block layer.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-11-22 15:59:22 +01:00
Kevin Wolf 668c0e441d gluster: Fix use after free in glfs_clear_preopened()
This fixes a use-after-free bug introduced in commit 6349c154. We need
to use QLIST_FOREACH_SAFE() when freeing elements in the loop. Spotted
by Coverity.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1479378608-11962-1-git-send-email-kwolf@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-21 17:04:43 -05:00
Paolo Bonzini bdffb31d8e mirror: do not flush every time the disks are synced
This puts a huge strain on the disks when there are many concurrent
migrations.  With this patch we only flush twice: just before issuing
the event, and just before pivoting to the destination.  If management
will complete the job close to the BLOCK_JOB_READY event, the cost of
the second flush should be small anyway.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20161109162008.27287-2-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-14 22:49:26 -05:00
Max Reitz 4e504535c1 block/curl: Do not wait for data beyond EOF
libcurl will only give us as much data as there is, not more. The block
layer will deny requests beyond the end of file for us; but since this
block driver is still using a sector-based interface, we can still get
in trouble if the file size is not a multiple of 512.

While we have already made sure not to attempt transfers beyond the end
of the file, we are currently still trying to receive data from there if
the original request exceeds the file size. This patch fixes this issue
and invokes qemu_iovec_memset() on the iovec's tail.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20161025025431.24714-5-mreitz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-14 22:47:34 -05:00
Max Reitz ff5ca1664a block/curl: Remember all sockets
For some connection types (like FTP, generally), more than one socket
may be used (in FTP's case: control vs. data stream). As of commit
838ef60249 ("curl: Eliminate unnecessary
use of curl_multi_socket_all"), we have to remember all of the sockets
used by libcurl, but in fact we only did that for a single one. Since
one libcurl connection may use multiple sockets, however, we have to
remember them all.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20161025025431.24714-4-mreitz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-14 22:47:34 -05:00
Max Reitz 4e7676571b block/curl: Fix return value from curl_read_cb
While commit 38bbc0a580 is correct in that
the callback is supposed to return the number of bytes handled; what it
does not mention is that libcurl will throw an error if the callback did
not "handle" all of the data passed to it.

Therefore, if the callback receives some data that it cannot handle
(either because the receive buffer has not been set up yet or because it
would not fit into the receive buffer) and we have to ignore it, we
still have to report that the data has been handled.

Obviously, this should not happen normally. But it does happen at least
for FTP connections where some data (that we do not expect) may be
generated when the connection is established.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20161025025431.24714-3-mreitz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-14 22:47:34 -05:00
Max Reitz 9054d9f6b0 block/curl: Use BDRV_SECTOR_SIZE
Currently, curl defines its own constant SECTOR_SIZE. There is no
advantage over using the global BDRV_SECTOR_SIZE, so drop it.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20161025025431.24714-2-mreitz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-14 22:47:34 -05:00
Max Reitz 23dce3873f block/curl: Drop TFTP "support"
Because TFTP does not support byte ranges, it was never usable with our
curl block driver. Since apparently nobody has ever complained loudly
enough for someone to take care of the issue until now, it seems
reasonable to assume that nobody has ever actually used it.

Therefore, it should be safe to just drop it from curl's protocol list.

[Jeff Cody: Below is additional summary pulled, with some rewording,
            from followup emails between Max and Markus, to explain what
            worked and what didn't]

TFTP would sometimes work, to a limited extent, for images <= the curl
"readahead" size, so long as reads started at offset zero.  By default,
that readahead size is 256KB.

Reads starting at a non-zero offset would also have returned data from a
zero offset.  It can become more complicated still, with mixed reads at
zero offset and non-zero offsets, due to data buffering.

In short, TFTP could only have worked before in very specific scenarios
with unrealistic expectations and constraints.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20161102175539.4375-4-mreitz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-14 22:47:34 -05:00
John Snow 111049a4ec blockjob: refactor backup_start as backup_job_create
Refactor backup_start as backup_job_create, which only creates the job,
but does not automatically start it. The old interface, 'backup_start',
is not kept in favor of limiting the number of nearly-identical interfaces
that would have to be edited to keep up with QAPI changes in the future.

Callers that wish to synchronously start the backup_block_job can
instead just call block_job_start immediately after calling
backup_job_create.

Transactions are updated to use the new interface, calling block_job_start
only during the .commit phase, which helps prevent race conditions where
jobs may finish before we even finish building the transaction. This may
happen, for instance, during empty block backup jobs.

Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1478587839-9834-6-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-14 22:47:34 -05:00
John Snow 5ccac6f186 blockjob: add block_job_start
Instead of automatically starting jobs at creation time via backup_start
et al, we'd like to return a job object pointer that can be started
manually at later point in time.

For now, add the block_job_start mechanism and start the jobs
automatically as we have been doing, with conversions job-by-job coming
in later patches.

Of note: cancellation of unstarted jobs will perform all the normal
cleanup as if the job had started, particularly abort and clean. The
only difference is that we will not emit any events, because the job
never actually started.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1478587839-9834-5-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-14 22:47:34 -05:00
John Snow a7815a764c blockjob: add .start field
Add an explicit start field to specify the entrypoint. We already have
ownership of the coroutine itself AND managing the lifetime of the
coroutine, let's take control of creation of the coroutine, too.

This will allow us to delay creation of the actual coroutine until we
know we'll actually start a BlockJob in block_job_start. This avoids
the sticky question of how to "un-create" a Coroutine that hasn't been
started yet.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1478587839-9834-4-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-14 22:47:34 -05:00
John Snow e8a40bf71d blockjob: add .clean property
Cleaning up after we have deferred to the main thread but before the
transaction has converged can be dangerous and result in deadlocks
if the job cleanup invokes any BH polling loops.

A job may attempt to begin cleaning up, but may induce another job to
enter its cleanup routine. The second job, part of our same transaction,
will block waiting for the first job to finish, so neither job may now
make progress.

To rectify this, allow jobs to register a cleanup operation that will
always run regardless of if the job was in a transaction or not, and
if the transaction job group completed successfully or not.

Move sensitive cleanup to this callback instead which is guaranteed to
be run only after the transaction has converged, which removes sensitive
timing constraints from said cleanup.

Furthermore, in future patches these cleanup operations will be performed
regardless of whether or not we actually started the job. Therefore,
cleanup callbacks should essentially confine themselves to undoing create
operations, e.g. setup actions taken in what is now backup_start.

Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1478587839-9834-3-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-14 22:47:34 -05:00
Stefan Hajnoczi 682df581c6 -----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYKeNwAAoJEH3vgQaq/DkO2uAQAKUidQMRQjHs3T5vyb7PcXCe
 DVV3PO+xKIFl+eWbjDYH2OdPL8OzgyNcGnwtHkdogKklWvYMD002vQ9YmNa2cbJn
 cO5d8jzSRtsTTLSbtjipFIrvJ8FxedX3Jay0cvEbaEqkgZXJV1sXN5CJ/Cseyf+G
 IZrG047Kf4V3inV8RDvJ9U/VcSlIcst9icZOuLlONvhXM7f+R5CkvqwUn4yVOObt
 Wwq32r47Dd9BwzrpxM//7haDvAXYm/xcP3bImN/3LAAwYPGkswxOe1I7Q62+fbpe
 dd/FSfhe6nRjStKTtH7T+AQk1VJKw34su9/FSxzIZaCzHYMco5CIziCwi0s4BocR
 GqZ0E0oPxWvrrFhljBxt1wA4d2j354Wq2cGbmb7rQpJTEbfGH5nDHqF1FAbMmd8N
 F9H6tSCvh1xJaJngGZjlMsgs6TkqyQEnCjk7SSAs1XS+qyrcyOWk7ydzAAc/RIHl
 iIN4aLcL7ix1rcoVttw+4VOSvihas6nTvRPPwVTbHO5003QpXdr3dckQaASP3PTd
 wky7blVk8+O8Y242F0AAYUb04agZ+KpqsaOcCL3SIPc3yBv3JCNCNy0gH4WIBX66
 yYxTgRtaNhHiUWaVQLximq1QUjz+vsTE07FI56PSabz1e/RkRp+BbrwaYLKYy+/F
 jBfRpP7pkPIWJhrPmYpJ
 =fKei
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'jsnow/tags/ide-pull-request' into staging

# gpg: Signature made Mon 14 Nov 2016 04:16:48 PM GMT
# gpg:                using RSA key 0x7DEF8106AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>"
# Primary key fingerprint: FAEB 9711 A12C F475 812F  18F2 88A9 064D 1835 61EB
#      Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76  CBD0 7DEF 8106 AAFC 390E

* jsnow/tags/ide-pull-request:
  ahci-test: add QMP tray test for ATAPI
  libqos/ahci: Add get_sense and test_ready
  libqos/ahci: Add ATAPI tray macros
  libqos/ahci: Support expected errors
  libqtest: add qmp_eventwait_ref
  block-backend: Always notify on blk_eject
  ahci-test: test atapi read_cd with bcl, nb_sectors = 0
  ahci-test: Create smaller test ISO images
  atapi: classify read_cd as conditionally returning data

Message-id: 1479140746-22142-1-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-11-14 17:07:16 +00:00
John Snow c47ee043dc block-backend: Always notify on blk_eject
blk_eject is only used by scsi-disk and atapi, and in both cases we
only attempt to invoke blk_eject if we have a bona-fide change in
tray state.

The "issue" here is that the tray state does not generate a QMP event
unless there is a medium/BDS attached to the device, so if libvirt et al
are waiting for a tray event to occur from an empty-but-closed drive,
software opening that drive will not emit an event and libvirt will
wait forever.

Change this by modifying blk_eject to always emit an event, instead of
conditionally on a "real" backend eject.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1373264

Reported-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1478553214-497-2-git-send-email-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2016-11-14 11:15:54 -05:00
Fam Zheng 4e6d13c983 raw-posix: Rename 'raw_s' to 'rs'
It is too confusing because it sounds like a BDRVRawState variable.

Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1477565117-17230-1-git-send-email-famz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-11-11 15:56:22 +01:00
Kevin Wolf 07555ba6f3 nfs: Fix memory leak in nfs_file_create()
The leak was introduced in commit 94d6a7a7.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-11-11 15:54:55 +01:00
Alberto Garcia 9dd76f82d9 qcow2: Remove stale FIXME comment
It was from the time when none of the global functions had a qcow2_
prefix.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-11-11 15:54:55 +01:00
Tomáš Golembiovský 80a15e3e2e raw_bsd: don't check size alignment when only offset is set
We make sure that the size is aligned to sector length to prevent any
round ups. Otherwise we could end up reading/writing data outside the
area specified by user. This is only needed when user supplies the size
option to avoid any surprises. It is not necessary when only offset is
set.

More over, the check made it difficult to use the offset option without
size option. The check puts unneeded restriction on the offset which had
to be aligned too. Because bdrv_getlength() returns aligned value having
unaligned offset would make the check fail.

Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-11-11 15:54:55 +01:00
Tomáš Golembiovský 40332872fe raw_bsd: move check to prevent overflow
When only offset is specified but no size and the offset is greater than
the real size of the containing device an overflow occurs when parsing
the options. This overflow is harmless because we do check for this
exact situation little bit later, but it leads to an error message with
weird values. It is better to do the check is sooner and prevent the
overflow.

Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-11-11 15:54:55 +01:00
Ashijeet Acharya 9a80832abf block/ssh: Code cleanup for unused parameter
This patch drops the unused parameter "BDRVSSHState" being passed into
the ssh_config() function and does code cleanup. The unused parameter
was introduced by the commit c322712.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-11-11 15:54:55 +01:00
Ashijeet Acharya a1d4e38a8b block/nbd: Fix the leaked visitor
This patch frees the leaked visitor in nbd_refresh_filename() and uses
visit_free() to fix it. The leak was introduced by the commit 491d6c7.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-11-11 15:54:55 +01:00
Kevin Wolf e6af1e0854 block: Don't mark node clean after failed flush
Commit 3ff2f67a changed bdrv_co_flush() so that no flush is issues if
the image hasn't been dirtied since the last flush. This is not quite
correct: The condition should be that the image hasn't been dirtied
since the last _successful_ flush. This patch changes the logic
accordingly.

Without this fix, subsequent bdrv_co_flush() calls would return success
without actually doing anything even though the image is still dirty.
The difference is visible in some blkdebug test cases where error
messages incorrectly disappeared after commit 3ff2f67a.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1478300595-10090-1-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-11-08 16:06:35 +00:00
Stefan Hajnoczi 199a5bde46 * NBD bugfix (Changlong)
* NBD write zeroes support (Eric)
 * Memory backend fixes (Haozhong)
 * Atomics fix (Alex)
 * New AVX512 features (Luwei)
 * "make check" logging fix (Paolo)
 * Chardev refactoring fallout (Paolo)
 * Small checkpatch improvements (Paolo, Jeff)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQExBAABCAAbBQJYGaRPFBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D
 XKgH/RgNtosBTqJsmphkS7wACFAFOf7Uq46ajoKfB66Pt1J/++pFQg4TApPYkb7j
 KlKeKmXa7hb6+Jg8325H4zGkGno4kn2dE+OnznaB1xPKwiZVAMQVzQsagsEVqpno
 k/5PBVRptIiuHQKyU29Go0CxbWJBTH0O14S7rDK4YDF0YMnuT280HQOI3jdu1igV
 G/Q+CMgfk+yXf6GWHE8Z9sNq7n0ha8qgruA/X3NC7+pAvEsUcAP065zwLp9weYuK
 W1MU68L7Ub4tRo0SVf1HFkDUNdMv4T4hg+wpGe1GwthJWexHu9x0YAQBy60ykJb6
 NtHwjLwCUWtm7AiZD/btsOJPmjk=
 =+Dt/
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* NBD bugfix (Changlong)
* NBD write zeroes support (Eric)
* Memory backend fixes (Haozhong)
* Atomics fix (Alex)
* New AVX512 features (Luwei)
* "make check" logging fix (Paolo)
* Chardev refactoring fallout (Paolo)
* Small checkpatch improvements (Paolo, Jeff)

# gpg: Signature made Wed 02 Nov 2016 08:31:11 AM GMT
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (30 commits)
  main-loop: Suppress I/O thread warning under qtest
  docs/rcu.txt: Fix minor typo
  vl: exit qemu on guest panic if -no-shutdown is not set
  checkpatch: allow spaces before parenthesis for 'coroutine_fn'
  x86: add AVX512_4VNNIW and AVX512_4FMAPS features
  slirp: fix CharDriver breakage
  qemu-char: do not forward events through the mux until QEMU has started
  nbd: Implement NBD_CMD_WRITE_ZEROES on client
  nbd: Implement NBD_CMD_WRITE_ZEROES on server
  nbd: Improve server handling of shutdown requests
  nbd: Refactor conversion to errno to silence checkpatch
  nbd: Support shorter handshake
  nbd: Less allocation during NBD_OPT_LIST
  nbd: Let client skip portions of server reply
  nbd: Let server know when client gives up negotiation
  nbd: Share common option-sending code in client
  nbd: Send message along with server NBD_REP_ERR errors
  nbd: Share common reply-sending code in server
  nbd: Rename struct nbd_request and nbd_reply
  nbd: Rename NbdClientSession to NBDClientSession
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-11-03 16:32:30 +00:00
Eric Blake fa778fffdf nbd: Implement NBD_CMD_WRITE_ZEROES on client
Upstream NBD protocol recently added the ability to efficiently
write zeroes without having to send the zeroes over the wire,
along with a flag to control whether the client wants a hole.

The generic block code takes care of falling back to the obvious
write of lots of zeroes if we return -ENOTSUP because the server
does not have WRITE_ZEROES.

Ideally, since NBD_CMD_WRITE_ZEROES does not involve any data
over the wire, we want to support transactions that are much
larger than the normal 32M limit imposed on NBD_CMD_WRITE.  But
the server may still have a limit smaller than UINT_MAX, so
until experimental NBD protocol additions for advertising various
command sizes is finalized (see [1], [2]), for now we just stick to
the same limits as normal writes.

[1] https://github.com/yoe/nbd/blob/extension-info/doc/proto.md
[2] https://sourceforge.net/p/nbd/mailman/message/35081223/

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-17-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02 09:28:56 +01:00
Eric Blake ed2dd91267 nbd: Rename struct nbd_request and nbd_reply
Our coding convention prefers CamelCase names, and we already
have other existing structs with NBDFoo naming.  Let's be
consistent, before later patches add even more structs.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-6-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02 09:28:55 +01:00
Eric Blake 10676b81a9 nbd: Rename NbdClientSession to NBDClientSession
It's better to use consistent capitalization of the namespace
used for NBD functions; we have more instances of NBD* than
Nbd*.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-5-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02 09:28:55 +01:00
Eric Blake b626b51a67 nbd: Treat flags vs. command type as separate fields
Current upstream NBD documents that requests have a 16-bit flags,
followed by a 16-bit type integer; although older versions mentioned
only a 32-bit field with masking to find flags.  Since the protocol
is in network order (big-endian over the wire), the ABI is unchanged;
but dealing with the flags as a separate field rather than masking
will make it easier to add support for upcoming NBD extensions that
increase the number of both flags and commands.

Improve some comments in nbd.h based on the current upstream
NBD protocol (https://github.com/yoe/nbd/blob/master/doc/proto.md),
and touch some nearby code to keep checkpatch.pl happy.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-3-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02 09:28:55 +01:00
Changlong Xie 9bc9732fae nbd: Use CoQueue for free_sema instead of CoMutex
NBD is using the CoMutex in a way that wasn't anticipated. For example, if there are
N(N=26, MAX_NBD_REQUESTS=16) nbd write requests, so we will invoke nbd_client_co_pwritev
N times.
----------------------------------------------------------------------------------------
time request Actions
1    1       in_flight=1, Coroutine=C1
2    2       in_flight=2, Coroutine=C2
...
15   15      in_flight=15, Coroutine=C15
16   16      in_flight=16, Coroutine=C16, free_sema->holder=C16, mutex->locked=true
17   17      in_flight=16, Coroutine=C17, queue C17 into free_sema->queue
18   18      in_flight=16, Coroutine=C18, queue C18 into free_sema->queue
...
26   N       in_flight=16, Coroutine=C26, queue C26 into free_sema->queue
----------------------------------------------------------------------------------------

Once nbd client recieves request No.16' reply, we will re-enter C16. It's ok, because
it's equal to 'free_sema->holder'.
----------------------------------------------------------------------------------------
time request Actions
27   16      in_flight=15, Coroutine=C16, free_sema->holder=C16, mutex->locked=false
----------------------------------------------------------------------------------------

Then nbd_coroutine_end invokes qemu_co_mutex_unlock what will pop coroutines from
free_sema->queue's head and enter C17. More free_sema->holder is C17 now.
----------------------------------------------------------------------------------------
time request Actions
28   17      in_flight=16, Coroutine=C17, free_sema->holder=C17, mutex->locked=true
----------------------------------------------------------------------------------------

In above scenario, we only recieves request No.16' reply. As time goes by, nbd client will
almostly recieves replies from requests 1 to 15 rather than request 17 who owns C17. In this
case, we will encounter assert "mutex->holder == self" failed since Kevin's commit 0e438cdc
"coroutine: Let CoMutex remember who holds it". For example, if nbd client recieves request
No.15' reply, qemu will stop unexpectedly:
----------------------------------------------------------------------------------------
time request       Actions
29   15(most case) in_flight=15, Coroutine=C15, free_sema->holder=C17, mutex->locked=false
----------------------------------------------------------------------------------------

Per Paolo's suggestion "The simplest fix is to change it to CoQueue, which is like a condition
variable", this patch replaces CoMutex with CoQueue.

Cc: Wen Congyang <wency@cn.fujitsu.com>
Reported-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Message-Id: <1476267508-19499-1-git-send-email-xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-01 16:06:57 +01:00
John Snow c87621ea68 blockjobs: split interface into public/private, Part 1
To make it a little more obvious which functions are intended to be
public interface and which are intended to be for use only by jobs
themselves, split the interface into "public" and "private" files.

Convert blockjobs (e.g. block/backup) to using the private interface.
Leave blockdev and others on the public interface.

There are remaining uses of private state by qemu-img, and several
cases in blockdev.c and block/io.c where we grab job->blk for the
purposes of acquiring an AIOContext.

These will be corrected in future patches.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1477584421-1399-7-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-01 08:04:56 -04:00
John Snow 8254b6d953 blockjob: centralize QMP event emissions
There's no reason to leave this to blockdev; we can do it in blockjobs
directly and get rid of an extra callback for most users.

All non-internal events, even those created outside of QMP, will
consistently emit events.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1477584421-1399-5-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-01 07:55:57 -04:00
John Snow 47970dfb0a Replication/Blockjobs: Create replication jobs as internal
Bubble up the internal interface to commit and backup jobs, then switch
replication tasks over to using this methodology.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1477584421-1399-4-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-01 07:55:57 -04:00
John Snow f81e0b4532 blockjobs: Allow creating internal jobs
Add the ability to create jobs without an ID.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1477584421-1399-3-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-01 07:55:57 -04:00
Prasanna Kumar Kalever 53d9837fb8 block/gluster: fix port type in the QAPI options list
After introduction of qapi schema in gluster block driver code, the port
type is now string as per InetSocketAddress

{ 'struct': 'InetSocketAddress',
  'data': {
    'host': 'str',
    'port': 'str',
    '*to': 'uint16',
    '*ipv4': 'bool',
    '*ipv6': 'bool' } }

but the current code still treats it as QEMU_OPT_NUMBER, hence fixing port
to accept QEMU_OPT_STRING.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-01 07:55:57 -04:00
Prasanna Kumar Kalever c56ac33b7a block/gluster: improve defense over string to int conversion
using atoi() for converting string to int may be error prone in case if
string supplied in the argument is not a fold of numerical number,

This is not a bug because in the existing code,

static QemuOptsList runtime_tcp_opts = {
    .name = "gluster_tcp",
    .head = QTAILQ_HEAD_INITIALIZER(runtime_tcp_opts.head),
    .desc = {
        ...
        {
            .name = GLUSTER_OPT_PORT,
            .type = QEMU_OPT_NUMBER,
            .help = "port number ...",
        },
...
};

port type is QEMU_OPT_NUMBER, before we actually reaches atoi() port is already
defended by parse_option_number()

However It is a good practice to use function like parse_uint_full()
over atoi() to keep port self defended

Note: As now the port string to int conversion has its defence code set,
and also we understand that port argument is actually a string type,
in the follow up patch let's move port type from QEMU_OPT_NUMBER to
QEMU_OPT_STRING

[Jeff Cody: removed spurious parenthesis]

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-01 07:55:57 -04:00
Fam Zheng 6f13acf97e block: Turn on "unmap" in active commit
We already specified BDRV_O_UNMAP when opening images in 'qemu-img
commit', but didn't turn on the "unmap" in the active commit job. This
patch fixes that so that zeroed clusters in top image can be discarded
which is desired in the virt-sparsify use case, where a temporary
overlay is created and fstrim'ed before commiting back, to free space in
the original image.

This also enables it for block-commit.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1474974892-5031-1-git-send-email-famz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-01 07:55:57 -04:00
Prasanna Kumar Kalever 6349c15410 block/gluster: memory usage: use one glfs instance per volume
Currently, for every drive accessed via gfapi we create a new glfs
instance (call glfs_new() followed by glfs_init()) which could consume
memory in few 100 MB's, from the table below it looks like for each
instance ~300 MB VSZ was consumed

Before:
-------
Disks   VSZ     RSS
1       1098728 187756
2       1430808 198656
3       1764932 199704
4       2084728 202684

This patch maintains a list of pre-opened glfs objects. On adding
a new drive belonging to the same gluster volume, we just reuse the
existing glfs object by updating its refcount.

With this approch we shrink up the unwanted memory consumption and
glfs_new/glfs_init calls for accessing a disk (file) if belongs to
same volume.

From below table notice that the memory usage after adding a disk
(which will reuse the existing glfs object hence) is in negligible
compared to before.

After:
------
Disks   VSZ     RSS
1       1101964 185768
2       1109604 194920
3       1114012 196036
4       1114496 199868

Disks: number of -drive
VSZ: virtual memory size of the process in KiB
RSS: resident set size, the non-swapped physical memory (in kiloBytes)

VSZ and RSS are analyzed using 'ps aux' utility.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1477581890-4811-1-git-send-email-prasanna.kalever@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-01 07:55:57 -04:00
Jeff Cody d9b789745b block: add gluster ifdef guard checks for SEEK_DATA/SEEK_HOLE support
Add checks to see if the system compiling QEMU has support for
SEEK_HOLE/SEEK_DATA.  If the system does not, we will flag that seek
data is unsupported in gluster.

Note: this is not a check on whether the gluster server itself supports
SEEK_DATA (that is already done during runtime), but rather if the
compilation environment supports SEEK_DATA.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
Message-id: 00370bce5c98140d6c56ad5145635ec6551265cc.1475876377.git.jcody@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-01 07:55:57 -04:00
Xiubo Li e38f643a1d rbd: make the code more readable
Make it a bit clearer and more readable.

Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1476519973-6436-1-git-send-email-lixiubo@cmss.chinamobile.com
CC: John Snow <jsnow@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-01 07:55:57 -04:00
Ashijeet Acharya 94d6a7a76e block/nfs: Introduce runtime_opts in NFS
Make NFS block driver use various fine grained runtime_opts.
Set .bdrv_parse_filename() to nfs_parse_filename() and introduce two
new functions nfs_parse_filename() and nfs_parse_uri() to help parsing
the URI.
Add a new option "server" which then accepts a new struct NFSServer.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
[ kwolf: Fixed client->path ]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-31 16:52:39 +01:00
Tomáš Golembiovský 2fdc70452a raw_bsd: add offset and size options
Added two new options 'offset' and 'size'. This makes it possible to use
only part of the file as a device. This can be used e.g. to limit the
access only to single partition in a disk image or use a disk inside a
tar archive (like OVA).

When 'size' is specified we do our best to honour it.

Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-31 16:52:39 +01:00
Alberto Garcia 61b49e48b3 block: Support streaming to an intermediate layer
This makes sure that the image we are streaming into is open in
read-write mode during the operation.

Operation blockers are also set in all intermediate nodes, since they
will be removed from the chain afterwards.

Finally, this also unblocks the stream operation in backing files.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-31 16:52:38 +01:00
Alberto Garcia f3ede4b05d block: Block all intermediate nodes in commit_active_start()
When block-commit is launched without the top parameter, it uses
internally a mirror block job. In that case all intermediate nodes
between the active and base nodes must be blocked as well.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-31 16:52:38 +01:00
Alberto Garcia 3e4c5122cb block: Block all nodes involved in the block-commit operation
After a successful block-commit operation all nodes between top and
base are removed from the backing chain, and top's overlay needs to
be updated to point to base. Because of that we should prevent other
block jobs from messing with them.

This patch blocks all operations in these nodes in commit_start().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-31 16:52:38 +01:00
Alberto Garcia b7340d002e block: Use block_job_add_bdrv() in backup_start()
Use block_job_add_bdrv() instead of blocking all operations in
backup_start() and unblocking them in backup_run().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-31 16:52:38 +01:00
Alberto Garcia cee3c6b5ca block: Use block_job_add_bdrv() in mirror_start_job()
Use block_job_add_bdrv() instead of blocking all operations in
mirror_start_job() and unblocking them in mirror_exit().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-31 16:52:38 +01:00
Alberto Garcia c0778f6693 block: Add bdrv_drain_all_{begin,end}()
bdrv_drain_all() doesn't allow the caller to do anything after all
pending requests have been completed but before block jobs are
resumed.

This patch splits bdrv_drain_all() into _begin() and _end() for that
purpose. It also adds aio_{disable,enable}_external() calls to disable
external clients in the meantime.

An important restriction of this split is that no new block jobs or
BlockDriverStates can be created between the bdrv_drain_all_begin()
and bdrv_drain_all_end() calls. This is not a concern now because
we'll only be using this in bdrv_reopen_multiple(), but it must be
dealt with if we ever have other uses cases in the future.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-31 16:51:14 +01:00
Ashijeet Acharya 1059f1bb42 block/ssh: Use InetSocketAddress options
Drop the use of legacy options in favour of the InetSocketAddress
options.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-31 16:49:13 +01:00
Ashijeet Acharya 0da5b8ef5d block/ssh: Add InetSocketAddress and accept it
Add InetSocketAddress compatibility to SSH driver.

Add a new option "server" to the SSH block driver which then accepts
a InetSocketAddress.

"host" and "port" are supported as legacy options and are mapped to
their InetSocketAddress representation.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-31 16:49:13 +01:00
Ashijeet Acharya 89dbe18089 block/ssh: Add ssh_has_filename_options_conflict()
We have 5 options plus ("server") option which is added in the next
patch that conflict with specifying a SSH filename. We need to iterate
over all the options to check whether its key has an "server." prefix.

This iteration will help us adding the new option "server" easily.

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-31 16:49:13 +01:00
Peter Maydell 5273a45e75 -----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
 
 iQEcBAABCAAGBQJYE2ULAAoJEMo1YkxqkXHGvvAH/iDPIAiwBXbndL3KhQTneSHn
 ctd4I3VK1/VVTIBRJIetqETiWiAm/WoRhI9kBc/NrQxBFx3ko+fpSYFS2t6lJYnV
 EX0vjTKjFhr05tOTQDH/SQtHdU5x/x2M8SsxqrCcTyLm5VDfdPeBlMBfSNMj/L2K
 bwinANVEwr6LOM0h8weQ0SvOCa5MLII2p5ufGwKQmhUY5tgZvFlyPa+quDVisKoE
 7CpLwWHmUQSNxUXSaru90osUJyk90wCcYxPpJN3YO1MHvpH4kG8DpZ8bnFqLAoNw
 zkRdqIrlfntD+mKDqRU1y0GXxu9I4VK1UDcQyRFoSdMi2oHR+L018sQEjCYTAXo=
 =n+CF
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/famz/tags/for-upstream' into staging

# gpg: Signature made Fri 28 Oct 2016 15:47:39 BST
# gpg:                using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021  AD56 CA35 624C 6A91 71C6

* remotes/famz/tags/for-upstream:
  aio: convert from RFifoLock to QemuRecMutex
  qemu-thread: introduce QemuRecMutex
  iothread: release AioContext around aio_poll
  block: only call aio_poll on the current thread's AioContext
  qemu-img: call aio_context_acquire/release around block job
  qemu-io: acquire AioContext
  block: prepare bdrv_reopen_multiple to release AioContext
  replication: pass BlockDriverState to reopen_backing_file
  iothread: detach all block devices before stopping them
  aio: introduce qemu_get_current_aio_context
  sheepdog: use BDRV_POLL_WHILE
  nfs: use BDRV_POLL_WHILE
  nfs: move nfs_set_events out of the while loops
  block: introduce BDRV_POLL_WHILE
  qed: Implement .bdrv_drain
  block: change drain to look only at one child at a time
  block: add BDS field to count in-flight requests
  mirror: use bdrv_drained_begin/bdrv_drained_end
  blockjob: introduce .drain callback for jobs
  replication: interrupt failover if the main device is closed

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-31 10:10:16 +00:00
Peter Maydell 01b601f061 Merge qio 2016/10/27 v1
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJYEfjrAAoJEL6G67QVEE/fdU4P/i7yBJo436OpkdgeWS8AWuFr
 ptZ+Fj/weGka5GU9E3KQu36kbSgrtfcgwTHphCMXnZ0YCeKQDuM57f7LNiN6qheB
 nqgJvJioLbUvLTQvCHOISM7bWOnYvASBmYtLJFtUcP/jhdOy61KaADnJ+7MbliNv
 yJSW2RN+s/y9nUb+dxEpIXXUVMRa6BX+wHW3O44c1oLn6/Pe20aJeHTyDx3qiBhD
 8RYXUgRZopH2bouBSzXgMQTbn/QMD/dC81WQlHKlt4swffyei2D/1pciOcuc0SXz
 +SZdkTre5JB5Kd6DU8zQ6PrrIt1nPmLSptSyhQvNxm+uWNWHnFcW1s2aYmf/ikjl
 4boW37ayJx09mns8yv7TerzEPbL5qJvVX8Dsnb6telkvrS9hy9S1xuIB5xHbt6/h
 vwFmCdwaZoGpDDaoXRL+9k9TOI9BbEMKX33nAPDqvEXLMIf+og4fmweTKcY4XTRL
 /Fdg1H71v8Ayv+r5TJOKwFg3PNNjnvqkbk1psS+aaW7dup43iaYGIKWy+VFaCufk
 hPXLOtR5lUsYC2qm+nkjPIgoP7D8oZx4AGkCHbYsqzi+l1lynZH3rBIs8ggLr72o
 FFk4g0sNYe1ccAa89jFEgWIQbS0N6ckUXCv12g3eyF/UIC1F35/mGGugSRnTXuc2
 a/WsvgU7pGBrtqXcg7lF
 =gsxL
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/berrange/tags/pull-qio-2016-10-27-1' into staging

Merge qio 2016/10/27 v1

# gpg: Signature made Thu 27 Oct 2016 13:54:03 BST
# gpg:                using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF

* remotes/berrange/tags/pull-qio-2016-10-27-1:
  main: set names for main loop sources created
  vnc: set name for all I/O channels created
  migration: set name for all I/O channels created
  char: set name for all I/O channels created
  nbd: set name for all I/O channels created
  io: add ability to set a name for IO channels
  io: Add a QIOChannelSocket cleanup test
  io: set LISTEN flag explicitly for listen sockets
  io: Introduce a qio_channel_set_feature() helper
  io: Use qio_channel_has_feature() where applicable
  io: Fix double shift usages on QIOChannel features

Conflicts:
	qemu-char.c

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-28 15:30:55 +01:00
Paolo Bonzini c9d1a56174 block: only call aio_poll on the current thread's AioContext
aio_poll is not thread safe; for example bdrv_drain can hang if
the last in-flight I/O operation is completed in the I/O thread after
the main thread has checked bs->in_flight.

The bug remains latent as long as all of it is called within
aio_context_acquire/aio_context_release, but this will change soon.

To fix this, if bdrv_drain is called from outside the I/O thread,
signal the main AioContext through a dummy bottom half.  The event
loop then only runs in the I/O thread.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1477565348-5458-18-git-send-email-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-28 21:50:18 +08:00
Paolo Bonzini 720150f318 block: prepare bdrv_reopen_multiple to release AioContext
After the next patch bdrv_drain_all will have to be called without holding any
AioContext.  Prepare to do this by adding an AioContext argument to
bdrv_reopen_multiple.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1477565348-5458-15-git-send-email-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-28 21:50:18 +08:00
Paolo Bonzini 8dd9006e9b replication: pass BlockDriverState to reopen_backing_file
This will be needed in the next patch to retrieve the AioContext.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1477565348-5458-14-git-send-email-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-28 21:50:18 +08:00
Paolo Bonzini f11672dbc0 sheepdog: use BDRV_POLL_WHILE
This is important when the sheepdog driver works on a BlockDriverState
that is attached to an I/O thread other than the main thread.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-Id: <1477565348-5458-11-git-send-email-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-28 21:50:18 +08:00
Paolo Bonzini d746427aaf nfs: use BDRV_POLL_WHILE
This will make it possible to use nfs_get_allocated_file_size on
a file that is not in the main AioContext.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1477565348-5458-10-git-send-email-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-28 21:50:18 +08:00
Paolo Bonzini aa92d6c460 nfs: move nfs_set_events out of the while loops
nfs_set_events only needs to be called once before entering the
while loop; afterwards, nfs_process_read and nfs_process_write
take care of it.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1477565348-5458-9-git-send-email-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-28 21:50:18 +08:00
Paolo Bonzini 88b062c203 block: introduce BDRV_POLL_WHILE
We want the BDS event loop to run exclusively in the iothread that
owns the BDS's AioContext.  This macro will provide the synchronization
between the two event loops; for now it just wraps the common idiom
of a while loop around aio_poll.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-Id: <1477565348-5458-8-git-send-email-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-28 21:50:18 +08:00
Fam Zheng 6653a73d12 qed: Implement .bdrv_drain
The "need_check_timer" is used to clear the "NEED_CHECK" flag in the
image header after a grace period once metadata update has finished. To
comply with the bdrv_drain semantics, we should make sure it remains
deleted once .bdrv_drain is called.

The change to qed_need_check_timer_cb is needed because bdrv_qed_drain
is called after s->bs has been drained, and should not operate on it;
instead it should operate on the BdrvChild-ren exclusively.  Doing so
is easy because QED does not have a bdrv_co_flush_to_os callback, hence
all that is needed to flush it is to ensure writes have reached the disk.

Based on commit df9a681dc9 (which however included some unrelated
hunks, possibly due to a merge failure or an overlooked squash).
The patch was reverted because at the time bdrv_qed_drain could call
qed_plug_allocating_write_reqs while an allocating write was queued.
This however is not possible anymore after the previous patch;
.bdrv_drain is only called after all writes have completed at the
QED level, and its purpose is to trigger metadata writes in bs->file.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1477565348-5458-7-git-send-email-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-28 21:50:18 +08:00
Paolo Bonzini d42cf28837 block: change drain to look only at one child at a time
bdrv_requests_pending is checking children to also wait until internal
requests (such as metadata writes) have completed.  However, checking
children is in general overkill.  Children requests can be of two kinds:

- requests caused by an operation on bs, e.g. a bdrv_aio_write to bs
causing a write to bs->file->bs.  In this case, the parent's in_flight
count will always be incremented by at least one for every request in
the child.

- asynchronous metadata writes or flushes.  Such writes can be started
even if bs's in_flight count is zero, but not after the .bdrv_drain
callback has been invoked.

This patch therefore changes bdrv_drain to finish I/O in the parent
(after which the parent's in_flight will be locked to zero), call
bdrv_drain (after which the parent will not generate I/O on the child
anymore), and then wait for internal I/O in the children to complete.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-Id: <1477565348-5458-6-git-send-email-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-28 21:50:18 +08:00
Paolo Bonzini 9972354856 block: add BDS field to count in-flight requests
Unlike tracked_requests, this field also counts throttled requests,
and remains non-zero if an AIO operation needs a BH to be "really"
completed.

With this change, it is no longer necessary to have a dummy
BdrvTrackedRequest for requests that are never serialising, and
it is no longer necessary to poll the AioContext once after
bdrv_requests_pending(bs) returns false.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-Id: <1477565348-5458-5-git-send-email-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-28 21:50:18 +08:00
Paolo Bonzini 9a0cec664e mirror: use bdrv_drained_begin/bdrv_drained_end
Ensure that there are no changes between the last check to
bdrv_get_dirty_count and the switch to the target.

There is already a bdrv_drained_end call, we only need to ensure
that bdrv_drained_begin is not called twice.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-Id: <1477565348-5458-4-git-send-email-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-28 21:50:18 +08:00
Paolo Bonzini bae8196d9f blockjob: introduce .drain callback for jobs
This is required to decouple block jobs from running in an
AioContext.  With multiqueue block devices, a BlockDriverState
does not really belong to a single AioContext.

The solution is to first wait until all I/O operations are
complete; then loop in the main thread for the block job to
complete entirely.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-Id: <1477565348-5458-3-git-send-email-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-28 21:50:18 +08:00
Paolo Bonzini 50ab0e0908 replication: interrupt failover if the main device is closed
Without this change, there is a race condition in tests/test-replication.
Depending on how fast the failover job (active commit) runs, there is a
chance of two bad things happening:

1) replication_done can be called after the secondary has been closed
and hence when the BDRVReplicationState is not valid anymore.

2) two copies of the active disk are present during the
/replication/secondary/stop test (that test runs immediately after
/replication/secondary/start, which tests failover).  This causes the
corruption detector to fire.

Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-Id: <1477565348-5458-2-git-send-email-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-28 21:50:18 +08:00
Max Reitz f84d431b86 block/nbd: Use SocketAddress options
Drop the use of legacy options in favor of the SocketAddress
representation, even for internal use (i.e. for storing the result of
the filename parsing).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-27 19:05:23 +02:00
Max Reitz 491d6c7c4e block/nbd: Accept SocketAddress
Add a new option "server" to the NBD block driver which accepts a
SocketAddress.

"path", "host" and "port" are still supported as legacy options and are
mapped to their corresponding SocketAddress representation.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-27 19:05:23 +02:00
Max Reitz 48c38e0b8d block/nbd: Add nbd_has_filename_options_conflict()
Right now, we have four possible options that conflict with specifying
an NBD filename, and a future patch will add another one ("address").
This future option is a nested QDict that is flattened at this point,
requiring us to test each option whether its key has an "address."
prefix. Therefore, we will then need to iterate through all options
(including the "export" option which was not covered so far).

Adding this iteration logic now will simplify adding the new option
later. A nice side effect is that the user will not receive a long list
of five options which are not supposed to be specified with a filename,
but we can actually print the problematic option.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-27 19:05:23 +02:00
Max Reitz fcfcd8ffcc block/nbd: Use qdict_put()
Instead of inlining this nice macro (i.e. resorting to
qdict_put_obj(..., QOBJECT(...))), use it.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-27 19:05:23 +02:00
Max Reitz 7edca33804 block/nbd: Default port in nbd_refresh_filename()
Instead of not emitting the port in nbd_refresh_filename(), just set it
to the default if the user did not specify it. This makes the logic a
bit simpler.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-27 19:05:23 +02:00
Max Reitz 442045cbce block/nbd: Reject port parameter without host
Currently, a port that is passed along with a UNIX socket path is
silently ignored. That is not exactly ideal, it should be an error
instead.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-27 19:05:23 +02:00
Max Reitz 82d73014a9 block/nbd: Drop trailing "." in error messages
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-27 19:05:23 +02:00
Kevin Wolf cbc14ac9c3 block: Remove bdrv_aio_ioctl()
It is unused now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-10-27 19:05:23 +02:00
Kevin Wolf 151a2930c4 raw: Implement .bdrv_co_ioctl instead of .bdrv_aio_ioctl
It's the simpler interface to use for the raw format driver.

Apart from that, this removes the last user of the AIO emulation
implemented by bdrv_aio_ioctl().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-10-27 19:05:23 +02:00
Kevin Wolf 16a389dc9e block: Introduce .bdrv_co_ioctl() driver callback
This allows drivers to implement ioctls in a coroutine-based way.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-10-27 19:05:23 +02:00
Kevin Wolf 61b2450414 block: Remove bdrv_ioctl()
It is unused now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-10-27 19:05:23 +02:00
Kevin Wolf 0d4377b3ea raw-posix: Don't use bdrv_ioctl()
Instead of letting raw-posix use the bdrv_ioctl() abstraction to issue
an ioctl to itself, just call ioctl() directly.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-10-27 19:05:23 +02:00
Kevin Wolf 48af776a5b block: Use blk_co_ioctl() for all BB level ioctls
All read/write functions already have a single coroutine-based function
on the BlockBackend level through which all requests go (no matter what
API style the external caller used) and which passes the requests down
to the block node level.

This patch exports a bdrv_co_ioctl() function and uses it to extend this
mode of operation to ioctls.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-10-27 19:05:22 +02:00
Kevin Wolf 7381e95cc2 block: Remove bdrv_aio_pdiscard()
It is unused now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-10-27 19:05:22 +02:00
Kevin Wolf 8c2e3dd55f block: Use blk_co_pdiscard() for all BB level discard
All read/write functions already have a single coroutine-based function
on the BlockBackend level through which all requests go (no matter what
API style the external caller used) and which passes the requests down
to the block node level.

This patch extends this mode of operation to discards.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-10-27 19:05:22 +02:00
Kevin Wolf be07a88981 block: Use blk_co_flush() for all BB level flushes
All read/write functions already have a single coroutine-based function
on the BlockBackend level through which all requests go (no matter what
API style the external caller used) and which passes the requests down
to the block node level.

This patch extends this mode of operation to flushes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-10-27 19:05:22 +02:00
Daniel P. Berrange 0d73f7253e nbd: set name for all I/O channels created
Ensure that all I/O channels created for NBD are given names
to distinguish their respective roles.

Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-10-27 09:13:10 +02:00
Daniel P. Berrange 7d5e199ade qapi: rename QmpOutputVisitor to QObjectOutputVisitor
The QmpOutputVisitor has no direct dependency on QMP. It is
valid to use it anywhere that one wants a QObject. Rename it
to better reflect its functionality as a generic QAPI
to QObject converter.

The commit before previous renamed the files, this one renames C
identifiers.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1475246744-29302-6-git-send-email-berrange@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Split into file rename and identifier rename]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-10-25 16:25:54 +02:00
Daniel P. Berrange b3db211f3c qapi: rename *qmp-*-visitor* to *qobject-*-visitor*
The QMP visitors have no direct dependency on QMP. It is
valid to use them anywhere that one has a QObject. Rename them
to better reflect their functionality as a generic QObject
to QAPI converter.

This is the first of three parts: rename the files.  The next two
parts will rename C identifiers.  The split is necessary to make git
rename detection work.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Split into file and identifier rename, two comments touched up]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-10-25 16:25:48 +02:00
Changlong Xie f4f2539bcf block/replication: Clarify 'top-id' parameter usage
The replication driver only supports the 'top-id' parameter for the
secondary side; it must not be supplied for the primary side.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Message-id: 1476247808-15646-1-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-10-24 17:56:07 +02:00
Fam Zheng 6d3f4049ba block: More operations for meta dirty bitmap
Callers can create an iterator of meta bitmap with
bdrv_dirty_meta_iter_new(), then use the bdrv_dirty_iter_* operations on
it. Meta iterators are also counted by bitmap->active_iterators.

Also add a couple of functions to retrieve granularity and count.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1476395910-8697-11-git-send-email-jsnow@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-10-24 17:56:07 +02:00
Vladimir Sementsov-Ogievskiy 882c36f590 block: BdrvDirtyBitmap serialization interface
Several functions to provide necessary access to BdrvDirtyBitmap for
block-migration.c

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[Add the "finish" parameters. - Fam]
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1476395910-8697-9-git-send-email-jsnow@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-10-24 17:56:07 +02:00
Fam Zheng 7105007a5c block: Assert that bdrv_release_dirty_bitmap succeeded
We use a loop over bs->dirty_bitmaps to make sure the caller is
only releasing a bitmap owned by bs. Let's also assert that in this case
the caller is releasing a bitmap that does exist.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1476395910-8697-7-git-send-email-jsnow@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-10-24 17:56:07 +02:00
Fam Zheng 15891fac7d block: Add two dirty bitmap getters
For dirty bitmap users to get the size and the name of a
BdrvDirtyBitmap.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1476395910-8697-6-git-send-email-jsnow@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-10-24 17:56:07 +02:00
Fam Zheng fb933437de block: Support meta dirty bitmap
The added group of operations enables tracking of the changed bits in
the dirty bitmap.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1476395910-8697-5-git-send-email-jsnow@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-10-24 17:56:07 +02:00
Fam Zheng dc162c8e4f block: Hide HBitmap in block dirty bitmap interface
HBitmap is an implementation detail of block dirty bitmap that should be hidden
from users. Introduce a BdrvDirtyBitmapIter to encapsulate the underlying
HBitmapIter.

A small difference in the interface is, before, an HBitmapIter is initialized
in place, now the new BdrvDirtyBitmapIter must be dynamically allocated because
the structure definition is in block/dirty-bitmap.c.

Two current users are converted too.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1476395910-8697-2-git-send-email-jsnow@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-10-24 17:56:07 +02:00
Paolo Bonzini 1ba7e15978 quorum: do not allocate multiple iovecs for FIFO strategy
In FIFO mode there are no parallel reads, hence there is no need to
allocate separate buffers and clone the iovecs.

The two cases of quorum_aio_cb are now even more different, and
most of quorum_aio_finalize is only needed in one of them, so split
them in separate functions.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1475685327-22767-3-git-send-email-pbonzini@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-10-24 17:56:06 +02:00
Paolo Bonzini 86ec252c19 quorum: change child_iter to children_read
This simplifies a bit the code by using the usual C "inclusive start,
exclusive end" pattern for ranges.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1475685327-22767-2-git-send-email-pbonzini@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-10-24 17:56:06 +02:00
Alberto Garcia 6bf77e1c2d throttle: Correct access to wrong BlockBackendPublic structures
In 27ccdd5259 the throttling fields were
moved from BlockDriverState to BlockBackend. However in a few cases
the code started using throttling fields from the active BlockBackend
instead of the round-robin token, making the algorithm behave
incorrectly.

This can cause starvation if there's a throttling group with several
drives but only one of them has I/O.

Cc: qemu-stable@nongnu.org
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-24 17:54:03 +02:00
Pino Toscano 3ac2f2f765 qapi: fix memory leak in bdrv_image_info_specific_dump
The 'obj' result of the visitor was not properly freed, like done in
other places doing a similar job.

Signed-off-by: Pino Toscano <ptoscano@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-24 17:54:03 +02:00
Halil Pasic 09237757a8 block: improve error handling in raw_open
Make raw_open for POSIX more consistent in handling errors by setting
the error object also when qemu_open fails. The error object was set
generally set in case of errors, but I guess this case was overlooked.
Do the same for win32.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Tested-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com> (POSIX only)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-24 17:54:03 +02:00
Fam Zheng 170f4b2e5c qcow2: Support BDRV_REQ_MAY_UNMAP
Handling this is similar to what is done to the L2 entry in the case of
compressed clusters.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-24 17:54:03 +02:00
Roy Shterman e0ae49871a block/iscsi: Adding new iSER transport layer option
iSER is a new transport layer supported in Libiscsi,
iSER provides a zero-copy RDMA capable interface that can
improve performance.

In order to use the new iSER transport one need to have RDMA supported HW
and to choose iser as the protocol name in Libiscsi URI.

For now iSER memory buffers are pre-allocated and pre-registered,
hence in order to work with iSER from QEMU, one need to enable
MEMLOCK attribute in the VM to be large enough for all iSER buffers and RDMA
resources.

Signed-off-by: Roy Shterman <roysh@mellanox.com>
Message-Id: <1476000896-18632-3-git-send-email-roysh@mellanox.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-24 11:30:55 +02:00
Roy Shterman 583ec22e23 block/iscsi: Introducing new zero-copy API
A new API to deploy zero-copy command submission. The new API takes I/O
vectors list and number of I/O vectors to submit as input parameters
when initiating the command. New API must be used if working with
iSER transport option.

Signed-off-by: Roy Shterman <roysh@mellanox.com>
Message-Id: <1476000896-18632-2-git-send-email-roysh@mellanox.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-24 11:30:55 +02:00
Paolo Bonzini e948f663e9 rbd: shift byte count as a 64-bit value
Otherwise, reads of more than 2GB fail.  Until commit
7bbca9e290, reads of 2^41
bytes succeeded at least theoretically.

In fact, pdiscard ought to receive a 64-bit integer as the
count for the same reason.

Reported by Coverity.

Fixes: 7bbca9e290
Cc: qemu-stable@nongnu.org
Cc: kwolf@redhat.com
Cc: eblake@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-23 16:10:59 +02:00
Gonglei f844836ddc crypto: extend mode as a parameter in qcrypto_cipher_supports()
It can't guarantee all cipher modes are supported
if one cipher algorithm is supported by a backend.
Let's extend qcrypto_cipher_supports() to take both
the algorithm and mode as parameters.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-10-19 10:09:24 +01:00
Fam Zheng 27685a8dd0 dmg: Move libbz2 code to dmg-bz2.so
dmg.o was moved to block-obj-m in 5505e8b76 to become a separate module,
so that its reference to libbz2, since 6b383c08c, doesn't add an extra
library to the main executable.

Until recently, commit 06e60f70a (blockdev: Add dynamic module loading
for block drivers) moved it back to block-obj-y to simplify the design
of dynamic loading of block modules. But we don't want to lose the
feature of less library dependency on the main executable.

The solution here is to move only the bz2 related code to a separate
DSO file, and load it when dmg_open is called.

dmg_probe doesn't depend on bz2 support to work, and is the only code in
this file which can run before dmg_open.

While we are at it, fix the unhelpful cast of last argument passed to
dmg_uncompress_bz2.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1473043845-13197-4-git-send-email-famz@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-10-07 14:14:06 +02:00
Kevin Wolf 2d76e724cf block: Add qdev ID to DEVICE_TRAY_MOVED
The event currently only contains the BlockBackend name. However, with
anonymous BlockBackends, this is always the empty string. Add the qdev
ID (or if none was given, the QOM path) so that the user can still see
which device caused the event.

Event generation has to be moved from bdrv_eject() to the BlockBackend
because the BDS doesn't know the attached device, but that's easy
because blk_eject() is the only user of it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-10-07 13:34:22 +02:00
Kevin Wolf bbc8ea98bc block-backend: Remember if attached device is non-qdev
Almost all block devices are qdevified by now. This allows us to go back
from the BlockBackend to the DeviceState. xen_disk is the last device
that is missing. We'll remember in the BlockBackend if a xen_disk is
attached and can then disable any features that require going from a BB
to the DeviceState.

While at it, clearly mark the function used by xen_disk as legacy even
in its name, not just in TODO comments.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-10-07 13:34:22 +02:00
Kevin Wolf 2bf7e10f78 block: Add node name to BLOCK_IO_ERROR event
The event currently only contains the BlockBackend name. However, with
anonymous BlockBackends, this is always the empty string. Add the node
name so that the user can still see which block device caused the event.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-10-07 13:34:22 +02:00
Paolo Bonzini fffb6e1223 block: use aio_bh_schedule_oneshot
This simplifies bottom half handlers by removing calls to qemu_bh_delete and
thus removing the need to stash the bottom half pointer in the opaque
datum.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-07 13:34:07 +02:00
Paolo Bonzini 818bbc86c9 block: use bdrv_add_before_write_notifier
Register the notifier using the specific API for block devices.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-07 13:34:07 +02:00
Kevin Wolf b85114f8cf block: Use 'detect-zeroes' option for 'blockdev-change-medium'
Instead of modifying the new BDS after it has been opened, use the newly
supported 'detect-zeroes' option in bdrv_open_common() so that all
requirements are checked (detect-zeroes=unmap requires discard=unmap).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-09-29 14:13:39 +02:00
Kevin Wolf 0a4279d97c block/qapi: Move 'aio' option to file driver
The option whether or not to use a native AIO interface really isn't a
generic option for all drivers, but only applies to the native file
protocols. This patch moves the option in blockdev-add to the
appropriate places (raw-posix and raw-win32).

We still have to keep the flag BDRV_O_NATIVE_AIO for compatibility
because so far the AIO option was usually specified on the wrong layer
(the top-level format driver, which didn't even look at it) and then
inherited by the protocol driver (where it was actually used). We can't
forbid this use except in new interfaces.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-09-29 14:13:39 +02:00
John Snow 49137bf684 block-backend: remove blk_flush_all
We can teach Xen to drain and flush each device as it needs to, instead
of trying to flush ALL devices. This removes the last user of
blk_flush_all.

The function is therefore removed under the premise that any new uses
of blk_flush_all would be the wrong paradigm: either flush the single
device that requires flushing, or use an appropriate flush_all mechanism
from outside of the BlkBackend layer.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-29 14:13:38 +02:00
John Snow 4085f5c7a2 block: reintroduce bdrv_flush_all
Commit fe1a9cbc moved the flush_all routine from the bdrv layer to the
block-backend layer. In doing so, however, the semantics of the routine
changed slightly such that flush_all now used blk_flush instead of
bdrv_flush.

blk_flush can fail if the attached device model reports that it is not
"available," (i.e. the tray is open.) This changed the semantics of
flush_all such that it can now fail for e.g. open CDROM drives.

Reintroduce bdrv_flush_all to regain the old semantics without having to
alter the behavior of blk_flush or blk_flush_all, which are already
'doing the right thing.'

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-29 14:13:13 +02:00
Peter Maydell c640f2849e * thread-safe tb_flush (Fred, Alex, Sergey, me, Richard, Emilio,... :-)
* license clarification for compiler.h (Felipe)
 * glib cflags improvement (Marc-André)
 * checkpatch silencing (Paolo)
 * SMRAM migration fix (Paolo)
 * Replay improvements (Pavel)
 * IOMMU notifier improvements (Peter)
 * IOAPIC now defaults to version 0x20 (Peter)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQExBAABCAAbBQJX6kKUFBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D
 M1UIAKCQ7XfWDoClYd1TyGZ+Qj3K3TrjwLDIl/Z258euyeZ9p7PpqYQ64OCRsREJ
 fsGQOqkFYDe7gi4epJiJOuu4oAW7Xu8G6lB2RfBd7KWVMhsl3Che9AEom7amzyzh
 yoN+g9gwKfAmYwpKyjYWnlWOSjUvif6o0DaTCQCMTaAoEM3b4HKdgHfr6A2dA/E/
 47rtIVp/jNExmrZkaOjnCDS1DJ8XYT3aVeoTkuzRFQ3DBzrAiPABn6B4ExP8IBcJ
 YLFX/W8xG7F3qyXbKQOV/uYM25A55WS5B0G94ZfSlDtUGa/avzS7df9DFD/IWQT+
 RpfiyDdeJueByiTw9R0ZYxFjhd8=
 =g7xm
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* thread-safe tb_flush (Fred, Alex, Sergey, me, Richard, Emilio,... :-)
* license clarification for compiler.h (Felipe)
* glib cflags improvement (Marc-André)
* checkpatch silencing (Paolo)
* SMRAM migration fix (Paolo)
* Replay improvements (Pavel)
* IOMMU notifier improvements (Peter)
* IOAPIC now defaults to version 0x20 (Peter)

# gpg: Signature made Tue 27 Sep 2016 10:57:40 BST
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (28 commits)
  replay: allow replay stopping and restarting
  replay: vmstate for replay module
  replay: move internal data to the structure
  cpus-common: lock-free fast path for cpu_exec_start/end
  tcg: Make tb_flush() thread safe
  cpus-common: Introduce async_safe_run_on_cpu()
  cpus-common: simplify locking for start_exclusive/end_exclusive
  cpus-common: remove redundant call to exclusive_idle()
  cpus-common: always defer async_run_on_cpu work items
  docs: include formal model for TCG exclusive sections
  cpus-common: move exclusive work infrastructure from linux-user
  cpus-common: fix uninitialized variable use in run_on_cpu
  cpus-common: move CPU work item management to common code
  cpus-common: move CPU list management to common code
  linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick()
  linux-user: Use QemuMutex and QemuCond
  cpus: Rename flush_queued_work()
  cpus: Move common code out of {async_, }run_on_cpu()
  cpus: pass CPUState to run_on_cpu helpers
  build-sys: put glib_cflags in QEMU_CFLAGS
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-28 23:02:56 +01:00
Stefan Hajnoczi fe121b9d3c linux-aio: fix re-entrant completion processing
Commit 0ed93d84ed ("linux-aio: process
completions from ioq_submit()") added an optimization that processes
completions each time ioq_submit() returns with requests in flight.
This commit introduces a "Co-routine re-entered recursively" error which
can be triggered with -drive format=qcow2,aio=native.

Fam Zheng <famz@redhat.com>, Kevin Wolf <kwolf@redhat.com>, and I
debugged the following backtrace:

  (gdb) bt
  #0  0x00007ffff0a046f5 in raise () at /lib64/libc.so.6
  #1  0x00007ffff0a062fa in abort () at /lib64/libc.so.6
  #2  0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at util/qemu-coroutine.c:113
  #3  0x0000555555a4b663 in qemu_laio_process_completions (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218
  #4  0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:331
  #5  0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, laiocb=laiocb@entry=0x555559d38ae0, offset=offset@entry=2932727808, type=type@entry=1) at block/linux-aio.c:383
  #6  0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) at block/linux-aio.c:402
  #7  0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, offset=offset@entry=2932727808, bytes=bytes@entry=8192, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:804
  #8  0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, req=req@entry=0x555559d38d20, offset=offset@entry=2932727808, bytes=bytes@entry=8192, align=align@entry=512, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:1041
  #9  0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=2932727808, bytes=8192, qiov=qiov@entry=0x555559d38e20, flags=flags@entry=0) at block/io.c:1133
  #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=<optimized out>) at block/qcow2.c:1509
  #11 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:804
  #12 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, req=req@entry=0x555559d39000, offset=offset@entry=6178725888, bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:1041
  #13 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=flags@entry=0) at block/io.c:1133
  #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at block/block-backend.c:783
  #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at block/block-backend.c:991
  #16 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78

It turned out that re-entrant ioq_submit() and completion processing
between three requests caused this error.  The following check is not
sufficient to prevent recursively entering coroutines:

  if (laiocb->co != qemu_coroutine_self()) {
      qemu_coroutine_enter(laiocb->co);
  }

As the following coroutine backtrace shows, not just the current
coroutine (self) can be entered.  There might also be other coroutines
that are currently entered and transferred control due to the qcow2 lock
(CoMutex):

  (gdb) qemu coroutine 0x5555583464d0
  #0  0x0000555555ac0c90 in qemu_coroutine_switch (from_=from_@entry=0x5555583464d0, to_=to_@entry=0x5555572f9890, action=action@entry=COROUTINE_ENTER) at util/coroutine-ucontext.c:175
  #1  0x0000555555abfe54 in qemu_coroutine_enter (co=0x5555572f9890) at util/qemu-coroutine.c:117
  #2  0x0000555555ac031c in qemu_co_queue_run_restart (co=co@entry=0x5555583462c0) at util/qemu-coroutine-lock.c:60
  #3  0x0000555555abfe5e in qemu_coroutine_enter (co=0x5555583462c0) at util/qemu-coroutine.c:119
  #4  0x0000555555a4b663 in qemu_laio_process_completions (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218
  #5  0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:331
  #6  0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, laiocb=laiocb@entry=0x55555a338b40, offset=offset@entry=2911477760, type=type@entry=1) at block/linux-aio.c:383
  #7  0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, s=0x555557e2f7f0, fd=13, offset=2911477760, qiov=0x55555a338e80, type=1) at block/linux-aio.c:402
  #8  0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, offset=offset@entry=2911477760, bytes=bytes@entry=8192, qiov=qiov@entry=0x55555a338e80, flags=0) at block/io.c:804
  #9  0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, req=req@entry=0x55555a338d80, offset=offset@entry=2911477760, bytes=bytes@entry=8192, align=align@entry=512, qiov=qiov@entry=0x55555a338e80, flags=0) at block/io.c:1041
  #10 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=2911477760, bytes=8192, qiov=qiov@entry=0x55555a338e80, flags=flags@entry=0) at block/io.c:1133
  #11 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, offset=6157475840, bytes=8192, qiov=0x5555575df720, flags=<optimized out>) at block/qcow2.c:1509
  #12 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, offset=offset@entry=6157475840, bytes=bytes@entry=8192, qiov=qiov@entry=0x5555575df720, flags=0) at block/io.c:804
  #13 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, req=req@entry=0x55555a339060, offset=offset@entry=6157475840, bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x5555575df720, flags=0) at block/io.c:1041
  #14 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=offset@entry=6157475840, bytes=bytes@entry=8192, qiov=qiov@entry=0x5555575df720, flags=flags@entry=0) at block/io.c:1133
  #15 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, offset=6157475840, bytes=8192, qiov=0x5555575df720, flags=0) at block/block-backend.c:783
  #16 0x0000555555a45266 in blk_aio_read_entry (opaque=0x555557231aa0) at block/block-backend.c:991
  #17 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78

Use the new qemu_coroutine_entered() function instead of comparing
against qemu_coroutine_self().  This is correct because:

1. If a coroutine is not entered then it must have yielded to wait for
   I/O completion.  It is therefore safe to enter.

2. If a coroutine is entered then it must be in
   ioq_submit()/qemu_laio_process_completions() because otherwise it
   would be yielded while waiting for I/O completion.  Therefore it will
   check laio->ret and return from ioq_submit() instead of yielding,
   i.e. it's guaranteed not to hang.

Reported-by: Fam Zheng <famz@redhat.com>
Tested-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1474989516-18255-4-git-send-email-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-28 17:11:23 +01:00
Pavel Dovgalyuk 6d0ceb80ff replay: allow replay stopping and restarting
This patch fixes bug with stopping and restarting replay
through monitor.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20160926080815.6992.71818.stgit@PASHA-ISP>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-27 11:57:30 +02:00
Peter Maydell 3b71ec8516 Block layer patches
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJX5RkyAAoJEH8JsnLIjy/WMnYP/i03Ii6nt8VY2PMX10NK3z+H
 5QskSnGzBgQg0OVFPQkvV1nW8bXPN07n/L2Q0H9FzQXTUzCbAkQC4MkzBZTMgaUp
 63XVRG151+AjqctvIncWQktUgMg6ywKBCug8G6gwMQ1BVsLerUJmE2tM0JGZF6WL
 q+F6uTtVMLNKR6miaTsJtFJA+ts9gGEecGNITRByEWbtD7ESF8TOXhCfpV/Ni45s
 5pMVOQT0o9jwgjekrUbJ9PrlxGSLfe/bi3ypqy6boe8EXQS6DImAiDKWkTQjGj2P
 cGlZ0oxWHq0g/15sgZOzqDtOXqE02+RK9C1UcASgObfdUos6gFpi6G7cAffzP9aq
 wzhi8u8Beth2WZP8tLLAJtoQjJhcjutWSw60HPhEcHZNkkaZF3KMXGDk8j1xp5EN
 8VVkWVZr+1ZOP+HsHCNrpvag+IP49DJ5j50W+oVqSKSzG37AC6SxNo3PEsg1urAi
 m5APFJbCgxMfDSnb8yw6ClBIe1IH624VMoELVzY1C9AqOnTxfJ08o4MbE8t1DEoa
 dQV/R3Mrqt3W+YflWNPXzVhRxD/mZW+RXPlBARYvFzHW12fh1PcwFWrB+ivdqu+4
 x0VvmZebJcEP+CxmRQOat80Zhos1Hs+ZyxUHyA5a0+Nt3YhK7k3sr4CQXxr5MIHA
 1c/D8znmWf9VksmW33U5
 =JG9f
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Fri 23 Sep 2016 12:59:46 BST
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (33 commits)
  block: Remove BB interface from blockdev-add/del
  qemu-iotests/141: Avoid blockdev-add with id
  block: Avoid printing NULL string in error messages
  qemu-iotests/139: Avoid blockdev-add with id
  qemu-iotests/124: Avoid blockdev-add with id
  qemu-iotests/118: Avoid blockdev-add with id
  qemu-iotests/117: Avoid blockdev-add with id
  qemu-iotests/087: Avoid blockdev-add with id
  qemu-iotests/081: Avoid blockdev-add with id
  qemu-iotests/071: Avoid blockdev-add with id
  qemu-iotests/067: Avoid blockdev-add with id
  qemu-iotests/041: Avoid blockdev-add with id
  qemu-iotests/118: Test media change with qdev name
  block: Accept device model name for block_set_io_throttle
  block: Accept device model name for blockdev-change-medium
  block: Accept device model name for eject
  block: Accept device model name for x-blockdev-remove-medium
  block: Accept device model name for x-blockdev-insert-medium
  block: Accept device model name for blockdev-open/close-tray
  qdev-monitor: Add blk_by_qdev_id()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-23 16:15:33 +01:00
Peter Maydell 4c892756fd -----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
 
 iQEcBAABCAAGBQJX5LZ0AAoJEMo1YkxqkXHGmgAIAKeDTKx0sA76ewIvH9fKdmUu
 NNatJw59XnVX8lpfOU5yISkJ4BD6oBdN7tbWaOW8yzcAeYu1Ff5iUu4LBEUFb7eW
 g6zqUCV58XjaCTLmTiAfa19Exfnh6pXZlZMRP4Hr3vUVSCHFmC0EyTEllfHxU/jW
 aPHtAEge/p6EDAHygHJBTSQzsaXRdyJNyt/AKPreDtblNRT8VgapCDzZQPcCVGH1
 F9grWVu0B/VVDS0mfgSRhT0UeF/vtiikuRW92sC4woVVB+brJyG4VwGT8oeUN8RU
 30/tGo5p9fpqef3iP669uUrloLfmWcKcIJuPfQ4ZUlZh8kIV+lWK9kZuTVgocGw=
 =xLJw
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/famz/tags/various-pull-request' into staging

# gpg: Signature made Fri 23 Sep 2016 05:58:28 BST
# gpg:                using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021  AD56 CA35 624C 6A91 71C6

* remotes/famz/tags/various-pull-request: (23 commits)
  docker: exec $CMD
  docker: Terminate instances at SIGTERM and SIGHUP
  docker: Support showing environment information
  docker: Print used options before doing configure
  docker: Flatten default target list in test-quick
  docker: Update fedora image to latest
  docker: Generate /packages.txt in ubuntu image
  docker: Generate /packages.txt in fedora image
  docker: Generate /packages.txt in centos6 image
  tests: Ignore test-uuid
  Add UUID files to MAINTAINERS
  tests: Add uuid tests
  uuid: Tighten uuid parse
  vl: Switch qemu_uuid to QemuUUID
  configure: Remove detection code for UUID
  tests: No longer dependent on CONFIG_UUID
  crypto: Switch to QEMU UUID API
  vpc: Use QEMU UUID API
  vdi: Use QEMU UUID API
  vhdx: Use QEMU UUID API
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

# Conflicts:
#	tests/Makefile.include
2016-09-23 13:10:43 +01:00
Kevin Wolf 1c89e1fa2f block: Add blk_by_dev()
This finds a BlockBackend given the device model that is attached to it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-09-23 13:36:10 +02:00
Alberto Garcia 0fe282bb4b commit: Add 'base' to the reopen queue before 'overlay_bs'
Now that we're checking for duplicates in the reopen queue, there's no
need to force a specific order in which the queue is constructed so we
can revert 3db2bd5508.

Since both ways of constructing the queue are now valid, this patch
doesn't have any effect on the behavior of QEMU and is not strictly
necessary. However it can help us check that the fix for the reopen
queue is robust: if it stops working properly at some point, iotest
040 will break.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-23 13:36:10 +02:00
Alberto Garcia f87a0e29a9 block: Add "read-only" to the options QDict
This adds the "read-only" option to the QDict. One important effect of
this change is that when a child inherits options from its parent, the
existing "read-only" mode can be preserved if it was explicitly set
previously.

This addresses scenarios like this:

   [E] <- [D] <- [C] <- [B] <- [A]

In this case, if we reopen [D] with read-only=off, and later reopen
[B], then [D] will not inherit read-only=on from its parent during the
bdrv_reopen_queue_child() stage.

The BDRV_O_RDWR flag is not removed yet, but its keep in sync with the
value of the "read-only" option.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-23 13:36:10 +02:00
Daniel P. Berrange bb9f8dd0e1 qcow2: fix encryption during cow of sectors
Broken in previous commit:

  commit aaa4d20b49
  Author: Kevin Wolf <kwolf@redhat.com>
  Date:   Wed Jun 1 15:21:05 2016 +0200

      qcow2: Make copy_sectors() byte based

The copy_sectors() code was originally using the 'sector'
parameter for encryption, which was passed in by the caller
from the QCowL2Meta.offset field (aka the guest logical
offset).

After the change, the code is using 'cluster_offset' which
was passed in from QCow2L2Meta.alloc_offset field (aka the
host physical offset).

This would cause the data to be encrypted using an incorrect
initialization vector which will in turn cause later reads
to return garbage.

Although current qcow2 built-in encryption is blocked from
usage in the emulator, one could still hit this if writing
to the file via qemu-{img,io,nbd} commands.

Cc: qemu-stable@nongnu.org
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-23 13:36:09 +02:00
Peter Maydell 6de68ffd7c * More KVM LAPIC fixes
* fix divide-by-zero regression on libiscsi SG devices
 * fix qemu-char segfault
 * add scripts/show-fixed-bugs.sh
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQExBAABCAAbBQJX5CEJFBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D
 PVcH/A6kHS5FHBrqLGl0BRUzZU0HSJJVOofAB55/60qknNajMSTAWM2i/mkNxs6P
 6MJD88Pb+aJFRlP6qD7CLGIlIYR38hH8VFTViuW9/Q+BuMMFgkqUauVCr3maRs82
 zH6o8wj/6PEi6yjBz3yVQCKI9+dHrXiv3BdIml88CP3jAMRjit7Hzcgad7NBmMjg
 Try4ipTxLq0qTkMPNn5IewtiARtJH6UInrE3pzbHKekORSvammJ8Lej0mqXYNACl
 fatQSXTiMc1DX4Mb3Bph6rm1FzmzRGKxydaidVZdswhj2+voMgt3Xvt0mfcaD8Gk
 aNbuHlj2tZ4zPIP08V21xYfd1xc=
 =EsZw
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* More KVM LAPIC fixes
* fix divide-by-zero regression on libiscsi SG devices
* fix qemu-char segfault
* add scripts/show-fixed-bugs.sh

# gpg: Signature made Thu 22 Sep 2016 19:20:57 BST
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  kvm: fix events.flags (KVM_VCPUEVENT_VALID_SMM) overwritten by 0
  scripts: Add a script to check for bug URLs in the git log
  msmouse: Fix segfault caused by free the chr before chardev cleanup.
  iscsi: Fix divide-by-zero regression on raw SG devices
  kvm: apic: set APIC base as part of kvm_apic_put
  target-i386: introduce kvm_put_one_msr

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-23 12:12:55 +01:00
Fam Zheng 38440a21fa vpc: Use QEMU UUID API
Previously we conditionally generated footer->uuid, when libuuid was
available. Now that we have a built-in implementation, we can switch to
it.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-Id: <1474432046-325-6-git-send-email-famz@redhat.com>
2016-09-23 11:42:52 +08:00
Fam Zheng 7c6f55b697 vdi: Use QEMU UUID API
The UUID operations we need from libuuid are fully supported by QEMU UUID
implementation. Use it, and remove the unused code.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-Id: <1474432046-325-5-git-send-email-famz@redhat.com>
2016-09-23 11:42:52 +08:00
Fam Zheng cb6414dfec vhdx: Use QEMU UUID API
This removes our dependency to libuuid, so that the driver can always be
built.

Similar to how we handled data plane configure options, --enable-vhdx
and --disable-vhdx are also changed to a nop with a message saying it's
obsolete.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-Id: <1474432046-325-4-git-send-email-famz@redhat.com>
2016-09-23 11:42:52 +08:00
Fam Zheng cea25275a3 util: Add UUID API
A number of different places across the code base use CONFIG_UUID. Some
of them are soft dependency, some are not built if libuuid is not
available, some come with dummy fallback, some throws runtime error.

It is hard to maintain, and hard to reason for users.

Since UUID is a simple standard with only a small number of operations,
it is cleaner to have a central support in libqemuutil. This patch adds
qemu_uuid_* functions that all uuid users in the code base can
rely on. Except for qemu_uuid_generate which is new code, all other
functions are just copy from existing fallbacks from other files.

Note that qemu_uuid_parse is moved without updating the function
signature to use QemuUUID, to keep this patch simple.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-Id: <1474432046-325-2-git-send-email-famz@redhat.com>
2016-09-23 11:42:52 +08:00
Eric Blake 95eaa78537 iscsi: Fix divide-by-zero regression on raw SG devices
When qemu uses iscsi devices in sg mode, iscsilun->block_size
is left at 0.  Prior to commits cf081fca and similar, when
block limits were tracked in sectors, this did not matter:
various block limits were just left at 0.  But when we started
scaling by block size, this caused SIGFPE.

Then, in a later patch, commit a5b8dd2c added an assertion to
bdrv_open_common() that request_alignment is always non-zero;
which was not true for SG mode.  Rather than relax that assertion,
we can just provide a sane value (we don't know of any SG device
with a block size smaller than qemu's default sizing of 512 bytes).

One possible solution for SG mode is to just blindly skip ALL
of iscsi_refresh_limits(), since we already short circuit so
many other things in sg mode.  But this patch takes a slightly
more conservative approach, and merely guarantees that scaling
will succeed, while still using multiples of the original size
where possible.  Resulting limits may still be zero in SG mode
(that is, we mostly only fix block_size used as a denominator
or which affect assertions, not all uses).

Reported-by: Holger Schranz <holger@fam-schranz.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
CC: qemu-stable@nongnu.org

Message-Id: <1473283640-15756-1-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-22 20:20:51 +02:00
Alberto Garcia 4d6f8cbba7 commit: get the overlay node before manipulating the backing chain
The 'block-commit' command has a 'top' parameter to specify the
topmost node from which the data is going to be copied.

   [E] <- [D] <- [C] <- [B] <- [A]

In this case if [C] is the top node then this is the result:

   [E] <- [B] <- [A]

[B] must be modified so its backing image string points to [E] instead
of [C]. commit_start() takes care of reopening [B] in read-write
mode, and commit_complete() puts it back in read-only mode once the
operation has finished.

In order to find [B] (the overlay node) we look for the node that has
[C] (the top node) as its backing image. However in commit_complete()
we're doing it after [C] has been removed from the chain, so [B] is
never found and remains in read-write mode.

This patch gets the overlay node before the backing chain is
manipulated.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 1471836963-28548-1-git-send-email-berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-09-20 22:12:57 +02:00
Colin Lord 4be4879ff8 blockdev: Modularize nfs block driver
Modularizes the nfs block driver so that it gets dynamically loaded.

Signed-off-by: Colin Lord <clord@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1471008424-16465-5-git-send-email-clord@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-09-20 22:12:57 +02:00
Marc Mari 88d88798b7 blockdev: Add dynamic module loading for block drivers
Extend the current module interface to allow for block drivers to be
loaded dynamically on request. The only block drivers that can be
converted into modules are the drivers that don't perform any init
operation except for registering themselves.

In addition, only the protocol drivers are being modularized, as they
are the only ones which see significant performance benefits. The format
drivers do not generally link to external libraries, so modularizing
them is of no benefit from a performance perspective.

All the necessary module information is located in a new structure found
in module_block.h

This spoils the purpose of 5505e8b76f (block/dmg: make it modular).

Before this patch, if module build is enabled, block-dmg.so is linked to
libbz2, whereas the main binary is not. In downstream, theoretically, it
means only the qemu-block-extra package depends on libbz2, while the
main QEMU package needn't to. With this patch, we (temporarily) change
the case so that the main QEMU depends on libbz2 again.

Signed-off-by: Marc Marí <markmb@redhat.com>
Signed-off-by: Colin Lord <clord@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1471008424-16465-4-git-send-email-clord@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Do a signed comparison against the length of
 block_driver_modules[], so it will not cause a compile error when
 empty]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-09-20 22:12:03 +02:00
Colin Lord f57b4b5fb1 blockdev: prepare iSCSI block driver for dynamic loading
This commit moves the initialization of the QemuOptsList qemu_iscsi_opts
struct out of block/iscsi.c in order to allow the iscsi module to be
dynamically loaded.

Signed-off-by: Colin Lord <clord@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1471008424-16465-2-git-send-email-clord@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-09-20 22:10:57 +02:00
Daniel P. Berrange 3bd18890ca crypto: make PBKDF iterations configurable for LUKS format
As protection against bruteforcing passphrases, the PBKDF
algorithm is tuned by counting the number of iterations
needed to produce 1 second of running time. If the machine
that the image will be used on is much faster than the
machine where the image is created, it can be desirable
to raise the number of iterations. This change adds a new
'iter-time' property that allows the user to choose the
iteration wallclock time.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-09-19 16:30:45 +01:00
Laurent Vivier 11d816a5bc sheepdog: remove useless casts
This patch is the result of coccinelle script
scripts/coccinelle/typecast.cocci

CC: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
CC: qemu-block@nongnu.org
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-09-15 15:32:22 +03:00
Tomáš Golembiovský a41c457881 curl: Operate on zero-length file
Another attempt to fix the bug 1596870.

When creating new disk backed by remote file accessed via HTTPS and the
backing file has zero length, qemu-img terminates with uniformative
error message:

    qemu-img: disk.qcow2: CURL: Error opening file:

While it may not make much sense to operate on empty file, other block
backends (e.g. raw backend for regular files) seem to allow it. This
patch fixes it for the curl backend and improves the reported error.

Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-09-15 15:32:22 +03:00
Ladi Prosek d4b84d564e Remove unused function declarations
Unused function declarations were found using a simple gcc plugin and
manually verified by grepping the sources.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-09-15 15:32:22 +03:00
Peter Maydell 4dfbe3767a Pull request
v2:
  * Fixed qcow2 sanitizer warnings [Peter]
  * Renamed get_error test cases to get_error_all to avoid tripping "error:"
    grep scripts [Peter]
  * Added Fam's iothread stop patch
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJX1862AAoJEJykq7OBq3PIK/YH/jLBxEu/H4hu+NzR7V7ur8lv
 F5E633tVzfxC7iM5CGUV+6qKM1UsIXvXn98JDxuGPh8aZaPdhsaalrS5iAJby6LN
 nPxam3ps1RlPeFA3hS3OqMizlO0qhp2gQZ5T6KPEktNXgSKP3kyiumXDo26XQoj6
 dtW1yap9lcvk3z9XBdTTpAfPLmAurvcxUptZsxQgo6O6/7jCuLyxEDuFenL13tOU
 vUvdrWvBn44YlzDjJc75nw2EBnCFLphw6SCj6seMj8RXDn2JUdVM0DQIZ2HQtBcT
 YlVjYl8hNyBwRZGi1QjSzdIhPYZQen0gg3EVJUBdwg/zgkrS9w91JImSwJGMxyg=
 =AHxB
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

Pull request

v2:
 * Fixed qcow2 sanitizer warnings [Peter]
 * Renamed get_error test cases to get_error_all to avoid tripping "error:"
   grep scripts [Peter]
 * Added Fam's iothread stop patch

# gpg: Signature made Tue 13 Sep 2016 11:02:30 BST
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request:
  iothread: Stop threads before main() quits
  tests: fix qvirtqueue_kick
  MAINTAINERS: add maintainer for replication
  support replication driver in blockdev-add
  tests: add unit test case for replication
  replication: Implement new driver for block replication
  replication: Introduce new APIs to do replication operation
  configure: support replication
  mirror: auto complete active commit
  docs: block replication's description
  block: Link backup into block core
  Backup: export interfaces for extra serialization
  Backup: clear all bitmap when doing block checkpoint
  block: unblock backup operations in backing file
  virtio-blk: rename virtio_device_info to virtio_blk_info
  linux-aio: process completions from ioq_submit()
  linux-aio: split processing events function
  linux-aio: consume events in userspace instead of calling io_getevents
  qcow2: avoid memcpy(dst, NULL, len)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-13 14:31:18 +01:00
Wen Congyang 29ff789060 replication: Implement new driver for block replication
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Message-id: 1469602913-20979-10-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Wen Congyang b49f7ead8d mirror: auto complete active commit
Auto complete mirror job in background to prevent from
blocking synchronously

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com>
Message-id: 1469602913-20979-7-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Wen Congyang 258854ad1d block: Link backup into block core
Some programs that add a dependency on it will use
the block layer directly.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1469602913-20979-5-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Changlong Xie a8bbee0edf Backup: export interfaces for extra serialization
Normal backup(sync='none') workflow:
step 1. NBD peformance I/O write from client to server
   qcow2_co_writev
    bdrv_co_writev
     ...
       bdrv_aligned_pwritev
        notifier_with_return_list_notify -> backup_do_cow
         bdrv_driver_pwritev // write new contents

step 2. drive-backup sync=none
   backup_do_cow
   {
    wait_for_overlapping_requests
    cow_request_begin
    for(; start < end; start++) {
            bdrv_co_readv_no_serialising //read old contents from Secondary disk
            bdrv_co_writev // write old contents to hidden-disk
    }
    cow_request_end
   }

step 3. Then roll back to "step 1" to write new contents to Secondary disk.

And for replication, we must make sure that we only read the old contents from
Secondary disk in order to keep contents consistent.

1) Replication workflow of Secondary
                                                         virtio-blk
                                                              ^
------->  1 NBD                                               |
   ||     server                                       3 replication
   ||        ^                                                ^
   ||        |           backing                 backing      |
   ||  Secondary disk 6<-------- hidden-disk 5 <-------- active-disk 4
   ||        |                         ^
   ||        '-------------------------'
   ||           drive-backup sync=none 2

Hence, we need these interfaces to implement coarse-grained serialization between
COW of Secondary disk and the read operation of replication.

Example codes about how to use them:

*#include "block/block_backup.h"

static coroutine_fn int xxx_co_readv()
{
        CowRequest req;
        BlockJob *job = secondary_disk->bs->job;

        if (job) {
              backup_wait_for_overlapping_requests(job, start, end);
              backup_cow_request_begin(&req, job, start, end);
              ret = bdrv_co_readv();
              backup_cow_request_end(&req);
              goto out;
        }
        ret = bdrv_co_readv();
out:
        return ret;
}

Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1469602913-20979-4-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Wen Congyang 49d3e828f8 Backup: clear all bitmap when doing block checkpoint
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1469602913-20979-3-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Roman Pen 0ed93d84ed linux-aio: process completions from ioq_submit()
In order to reduce completion latency it makes sense to harvest completed
requests ASAP.  Very fast backend device can complete requests just after
submission, so it is worth trying to check ring buffer in order to peek
completed requests directly after io_submit() has been called.

Indeed, this patch reduces the completions latencies and increases the
overall throughput, e.g. the following is the percentiles of number of
completed requests at once:

        1th 10th  20th  30th  40th  50th  60th  70th  80th  90th  99.99th
Before    2    4    42   112   128   128   128   128   128   128    128
 After    1    1     4    14    33    45    47    48    50    51    108

That means, that before the current patch is applied the ring buffer is
observed as full (128 requests were consumed at once) in 60% of calls.

After patch is applied the distribution of number of completed requests
is "smoother" and the queue (requests in-flight) is almost never full.

The fio read results are the following (write results are almost the
same and are not showed here):

  Before
  ------
job: (groupid=0, jobs=8): err= 0: pid=2227: Tue Jul 19 11:29:50 2016
  Description  : [Emulation of Storage Server Access Pattern]
  read : io=54681MB, bw=1822.7MB/s, iops=179779, runt= 30001msec
    slat (usec): min=172, max=16883, avg=338.35, stdev=109.66
    clat (usec): min=1, max=21977, avg=1051.45, stdev=299.29
     lat (usec): min=317, max=22521, avg=1389.83, stdev=300.73
    clat percentiles (usec):
     |  1.00th=[  346],  5.00th=[  596], 10.00th=[  708], 20.00th=[  852],
     | 30.00th=[  932], 40.00th=[  996], 50.00th=[ 1048], 60.00th=[ 1112],
     | 70.00th=[ 1176], 80.00th=[ 1256], 90.00th=[ 1384], 95.00th=[ 1496],
     | 99.00th=[ 1800], 99.50th=[ 1928], 99.90th=[ 2320], 99.95th=[ 2672],
     | 99.99th=[ 4704]
    bw (KB  /s): min=205229, max=553181, per=12.50%, avg=233278.26, stdev=18383.51

  After
  ------
job: (groupid=0, jobs=8): err= 0: pid=2220: Tue Jul 19 11:31:51 2016
  Description  : [Emulation of Storage Server Access Pattern]
  read : io=57637MB, bw=1921.2MB/s, iops=189529, runt= 30002msec
    slat (usec): min=169, max=20636, avg=329.61, stdev=124.18
    clat (usec): min=2, max=19592, avg=988.78, stdev=251.04
     lat (usec): min=381, max=21067, avg=1318.42, stdev=243.58
    clat percentiles (usec):
     |  1.00th=[  310],  5.00th=[  580], 10.00th=[  748], 20.00th=[  876],
     | 30.00th=[  908], 40.00th=[  948], 50.00th=[ 1012], 60.00th=[ 1064],
     | 70.00th=[ 1080], 80.00th=[ 1128], 90.00th=[ 1224], 95.00th=[ 1288],
     | 99.00th=[ 1496], 99.50th=[ 1608], 99.90th=[ 1960], 99.95th=[ 2256],
     | 99.99th=[ 5408]
    bw (KB  /s): min=212149, max=390160, per=12.49%, avg=245746.04, stdev=11606.75

Throughput increased from 1822MB/s to 1921MB/s, average completion latencies
decreased from 1051us to 988us.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Message-id: 1468931263-32667-4-git-send-email-roman.penyaev@profitbricks.com
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Roman Pen 3407de572b linux-aio: split processing events function
Prepare processing events function to be called from ioq_submit(),
thus split function on two parts: the first harvests completed IO
requests, the second submits pending requests.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Message-id: 1468931263-32667-3-git-send-email-roman.penyaev@profitbricks.com
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:55 +01:00
Roman Pen 9e909a5829 linux-aio: consume events in userspace instead of calling io_getevents
AIO context in userspace is represented as a simple ring buffer, which
can be consumed directly without entering the kernel, which obviously
can bring some performance gain.  QEMU does not use timeout value for
waiting for events completions, so we can consume all events from
userspace.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Message-id: 1468931263-32667-2-git-send-email-roman.penyaev@profitbricks.com
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:55 +01:00
Stefan Hajnoczi 0647d47cc1 qcow2: avoid memcpy(dst, NULL, len)
Section "7.1.4 Use of library functions" in the C99 standard says:

  If an argument to a function has an invalid value (such as [...]
  a null pointer [...]) [...] the behavior is undefined.

Additionally the "searching and sorting" functions are specified as
requiring valid pointer values as described in 7.1.4.

This patch fixes the following sanitizer errors:

  block/qcow2.c:1807:41: runtime error: null pointer passed as argument 2, which is declared to never be null
  block/qcow2-cluster.c:86:26: runtime error: null pointer passed as argument 2, which is declared to never be null

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1473758138-19260-1-git-send-email-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:55 +01:00
Prasanna Kumar Kalever e9db8ff38e block/gluster: add support to choose libgfapi logfile
currently all the libgfapi logs defaults to '/dev/stderr' as it was hardcoded
in a call to glfs logging api. When the debug level is chosen to DEBUG/TRACE,
gfapi logs will be huge and fill/overflow the console view.

This patch provides a commandline option to mention log file path which helps
in logging to the specified file and also help in persisting the gfapi logs.

Usage:
-----
 *URI Style:
  ---------
  -drive file=gluster://hostname/volname/image.qcow2,file.debug=9,\
                      file.logfile=/var/log/qemu/qemu-gfapi.log

 *JSON Style:
  ----------
  'json:{
           "driver":"qcow2",
           "file":{
              "driver":"gluster",
              "volume":"volname",
              "path":"image.qcow2",
              "debug":"9",
              "logfile":"/var/log/qemu/qemu-gfapi.log",
              "server":[
                 {
                    "type":"tcp",
                    "host":"1.2.3.4",
                    "port":24007
                 },
                 {
                    "type":"unix",
                    "socket":"/var/run/glusterd.socket"
                 }
              ]
           }
        }'

Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-09-13 01:34:47 -04:00
Pavel Butsykin 8b2bd09338 qcow2: fix iovec size at qcow2_co_pwritev_compressed
Use bytes as the size would be more exact than s->cluster_size.  Although
qemu_iovec_to_buf() will not allow to go beyond the qiov.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05 19:06:48 +02:00
Pavel Butsykin 13b9414b57 drive-backup: added support for data compression
The idea is simple - backup is "written-once" data. It is written block
by block and it is large enough. It would be nice to save storage
space and compress it.

The patch adds a flag to the qmp/hmp drive-backup command which enables
block compression. Compression should be implemented in the format driver
to enable this feature.

There are some limitations of the format driver to allow compressed writes.
We can write data only once. Though for backup this is perfectly fine.
These limitations are maintained by the driver and the error will be
reported if we are doing something wrong.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05 19:06:48 +02:00
Pavel Butsykin 3ea1a09111 block/io: turn on dirty_bitmaps for the compressed writes
Previously was added the assert:

  commit 1755da16e3
  Author: Paolo Bonzini <pbonzini@redhat.com>
  Date:   Thu Oct 18 16:49:18 2012 +0200
  block: introduce new dirty bitmap functionality

Now the compressed write is always in coroutine and setting the bits is
done after the write, so that we can return the dirty_bitmaps for the
compressed writes.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05 19:06:48 +02:00
Pavel Butsykin 35fadca80e block: remove BlockDriver.bdrv_write_compressed
There are no block drivers left that implement the old
.bdrv_write_compressed interface, so it can be removed. Also now we have
no need to use the bdrv_pwrite_compressed function and we can remove it
entirely.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05 19:06:48 +02:00
Pavel Butsykin 655923df4b qcow: cleanup qcow_co_pwritev_compressed to avoid the recursion
Now that the function uses a vector instead of a buffer, there is no
need to use recursive code.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05 19:06:48 +02:00
Pavel Butsykin f2b95a1231 qcow: add qcow_co_pwritev_compressed
Added implementation of the qcow_co_pwritev_compressed function that
will allow us to safely use compressed writes for the qcow from running
VMs.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05 19:06:48 +02:00
Pavel Butsykin b2c622d365 vmdk: add vmdk_co_pwritev_compressed
Added implementation of the vmdk_co_pwritev_compressed function that
will allow us to safely use compressed writes for the vmdk from running
VMs.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05 19:06:48 +02:00
Pavel Butsykin a2c0ca6f55 qcow2: cleanup qcow2_co_pwritev_compressed to avoid the recursion
Now that the function uses a vector instead of a buffer, there is no
need to use recursive code.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05 19:06:48 +02:00
Pavel Butsykin fcccefc57f qcow2: add qcow2_co_pwritev_compressed
Added implementation of the qcow2_co_pwritev_compressed function that
will allow us to safely use compressed writes for the qcow2 from running
VMs.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05 19:06:48 +02:00
Pavel Butsykin 29a298af9d block/io: reuse bdrv_co_pwritev() for write compressed
For bdrv_pwrite_compressed() it looks like most of the code creating
coroutine is duplicated in bdrv_prwv_co(). So we can just add a flag
(BDRV_REQ_WRITE_COMPRESSED) and use bdrv_prwv_co() as a generic one.
In the end we get coroutine oriented function for write compressed by using
bdrv_co_pwritev/blk_co_pwritev with BDRV_REQ_WRITE_COMPRESSED flag.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05 19:06:48 +02:00
Pavel Butsykin 751e2f0698 block: Convert bdrv_pwrite_compressed() to BdrvChild
Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05 19:06:47 +02:00
Pavel Butsykin fe5c1355e7 block: switch blk_write_compressed() to byte-based interface
This is a preparatory patch, which continues the general trend of the
transition to the byte-based interfaces. bdrv_check_request() and
blk_check_request() are no longer used, thus we can remove them.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05 19:06:47 +02:00
Kevin Wolf b6c1bae5df block: Accept node-name for block-stream
In order to remove the necessity to use BlockBackend names in the
external API, we want to allow node-names everywhere. This converts
block-stream to accept a node-name without lifting the restriction that
we're operating at a root node.

In case of an invalid device name, the command returns the GenericError
error class now instead of DeviceNotFound, because this is what
qmp_get_root_bs() returns.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2016-09-05 19:06:47 +02:00
Denis V. Lunev 156af3ac98 block: fix possible reorder of flush operations
This patch reduce CPU usage of flush operations a bit. When we have one
flush completed we should kick only next operation. We should not start
all pending operations in the hope that they will go back to wait on
wait_queue.

Also there is a technical possibility that requests will get reordered
with the previous approach. After wakeup all requests are removed from
the wait queue. They become active and they are processed one-by-one
adding to the wait queue in the same order. Though new flush can arrive
while all requests are not put into the queue.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Tested-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Message-id: 1471457214-3994-3-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-08-18 14:36:49 +01:00
Evgeny Yakovlev ce83ee57f6 block: fix deadlock in bdrv_co_flush
The following commit
    commit 3ff2f67a7c
    Author: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
    Date:   Mon Jul 18 22:39:52 2016 +0300
    block: ignore flush requests when storage is clean
has introduced a regression.

There is a problem that it is still possible for 2 requests to execute
in non sequential fashion and sometimes this results in a deadlock
when bdrv_drain_one/all are called for BDS with such stalled requests.

1. Current flushed_gen and flush_started_gen is 1.
2. Request 1 enters bdrv_co_flush to with write_gen 1 (i.e. the same
   as flushed_gen). It gets past flushed_gen != flush_started_gen and
   sets flush_started_gen to 1 (again, the same it was before).
3. Request 1 yields somewhere before exiting bdrv_co_flush
4. Request 2 enters bdrv_co_flush with write_gen 2. It gets past
   flushed_gen != flush_started_gen and sets flush_started_gen to 2.
5. Request 2 runs to completion and sets flushed_gen to 2
6. Request 1 is resumed, runs to completion and sets flushed_gen to 1.
   However flush_started_gen is now 2.

From here on out flushed_gen is always != to flush_started_gen and all
further requests will wait on flush_queue. This change replaces
flush_started_gen with an explicitly tracked active flush request.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Message-id: 1471457214-3994-2-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-08-18 14:36:49 +01:00
Fam Zheng 92b6a16087 curl: Cast fd to int for DPRINTF
Currently "make docker-test-mingw@fedora" has a warning like:

    /tmp/qemu-test/src/block/curl.c: In function 'curl_sock_cb':
    /tmp/qemu-test/src/block/curl.c:172:6: warning: format '%d' expects
    argument of type 'int', but argument 4 has type 'curl_socket_t {aka long
    long unsigned int}'
         DPRINTF("CURL (AIO): Sock action %d on fd %d\n", action, fd);
          ^
    cc1: all warnings being treated as errors

Cast to int to suppress it.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1470027888-24381-1-git-send-email-famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2016-08-17 19:57:54 +08:00
Peter Maydell 66940d7491 Block layer patches for 2.7.0-rc3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJXscniAAoJEH8JsnLIjy/WrPMQAKSb0eHNaejFF5JuizOZpsAp
 oXf34TedMzSLFDYnKOJjDGUTep3abWq2+Za6H0+upF+4EhIgteP+F1q+vhq2f1Pw
 rSS1hK2gVLB3cwngB2/phq+l7lBywts1R9D1xpes9WavIhfQjrUUPT5YGSsILXlI
 ysBdbaOvVsu74QtW6MoNHHccCGaFAXkbpfSnjdWqlYUjO2wbSOE2odGCdCI/05mr
 bZl0gzXit4EA8eH3baNwf1BDF07fOfezRrhYgGrWxXNZ9XTXClZGn9ixtjXuutZM
 E3ef2RIo7ROeHb7Zz0kfc8g9W6WPEdDYEcbl2bSuL9RGRRaIfuA+KlyULoH5olEa
 VeYNT/NKUdbO8ogIw5dqHX8DSwSXFy0TnHenDB4LQ0CvtZOiyYFdaJwcp37R0pqX
 r/9HVpllSsVP+iGHCsWwsBh2rCUsXmia1NFRHQcGgZXOnijxU0yrfwJbMd0jrBkV
 Noo72hm42SARvf5YUjowgqTdSRM2cJyfkFwMDuF3IXmuG1IiYMkaD7veqhPng0BC
 xOD3balnzeQ3xirvzlijwLiVSdGOOMblNL5Fd6C61GxL4hO7InMATSluV8O+6hmm
 IwGo9jMEEZ6Cttd1VHULJnmY8WVWxcJg8Z6DBKGwUtz0UWmv20at1rLMt9dx3sBu
 U31TbsUdxYFNB01f6SLt
 =z+3e
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches for 2.7.0-rc3

# gpg: Signature made Mon 15 Aug 2016 14:55:46 BST
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  iotests: Test case for wrong runtime option types
  block/nbd: Store runtime option values
  block/blkdebug: Store config filename
  block/nbd: Use QemuOpts for runtime options
  block/ssh: Use QemuOpts for runtime options

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-08-16 09:32:40 +01:00
Max Reitz 03504d05f0 block/nbd: Store runtime option values
Store the runtime option values in the BDRVNBDState so they can later be
used in nbd_refresh_filename() without having to directly access the
options QDict which may contain values of non-string types.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-08-15 15:52:29 +02:00
Max Reitz 036990d72b block/blkdebug: Store config filename
Store the configuration file's filename so it can later be used in
bdrv_refresh_filename() without having to directly access the options
QDict which may contain a value of a non-string type.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-08-15 15:52:28 +02:00
Max Reitz 7ccc44fd7d block/nbd: Use QemuOpts for runtime options
Using QemuOpts will prevent qemu from crashing if the input options have
not been validated (which is the case when they are specified on the
command line or in a json: filename) and some have the wrong type.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-08-15 15:52:28 +02:00
Max Reitz 8a6a80896d block/ssh: Use QemuOpts for runtime options
Using QemuOpts will prevent qemu from crashing if the input options have
not been validated (which is the case when they are specified on the
command line or in a json: filename) and some have the wrong type.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-08-15 15:52:28 +02:00
Laurent Vivier e723b87103 trace-events: fix first line comment in trace-events
Documentation is docs/tracing.txt instead of docs/trace-events.txt.

find . -name trace-events -exec \
     sed -i "s?See docs/trace-events.txt for syntax documentation.?See docs/tracing.txt for syntax documentation.?" \
     {} \;

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-id: 1470669081-17860-1-git-send-email-lvivier@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-08-12 10:36:01 +01:00
Kevin Wolf 44713c9e85 linux-aio: Handle io_submit() failure gracefully
It is generally not expected that io_submit() fails other than with
-EAGAIN, but corner cases like SELinux refusing I/O when permissions are
revoked are still possible. In this case, we shouldn't abort, but just
return an I/O error for the request.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1470741619-23231-1-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-08-11 09:42:35 +01:00
Vladimir Sementsov-Ogievskiy dbaa7b57ec mirror: finish earlier on error
Stop to produce new async copy requests from mirror_iteration if
critical error (error action = BLOCK_ERROR_ACTION_REPORT) detected.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-08-08 13:05:43 +02:00
Klim Kireev 555a608c5d block/parallels: check new image size
Before this patch incorrect image could be created via qemu-img
(Example: qemu-img create -f parallels -o size=4096T hack.img),
incorrect images cannot be used due to overflow in main image structure.

This patch add check of size in image creation.

After reading size it compare it with UINT32_MAX * cluster_size.

Signed-off-by: Klim Kireev <proffk@virtuozzo.mipt.ru>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Message-id: 1469639300-12155-1-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-08-05 09:59:06 +01:00
Eric Blake b8d0a9804d block: Cater to iscsi with non-power-of-2 discard
Dell Equallogic iSCSI SANs have a very unusual advertised geometry:

$ iscsi-inq -e 1 -c $((0xb0)) iscsi://XXX/0
wsnz:0
maximum compare and write length:1
optimal transfer length granularity:0
maximum transfer length:0
optimal transfer length:0
maximum prefetch xdread xdwrite transfer length:0
maximum unmap lba count:30720
maximum unmap block descriptor count:2
optimal unmap granularity:30720
ugavalid:1
unmap granularity alignment:0
maximum write same length:30720

which says that both the maximum and the optimal discard size
is 15M.  It is not immediately apparent if the device allows
discard requests not aligned to the optimal size, nor if it
allows discards at a finer granularity than the optimal size.

I tried to find details in the SCSI Commands Reference Manual
Rev. A on what valid values of maximum and optimal sizes are
permitted, but while that document mentions a "Block Limits
VPD Page", I couldn't actually find documentation of that page
or what values it would have, or if a SCSI device has an
advertisement of its minimal unmap granularity.  So it is not
obvious to me whether the Dell Equallogic device is compliance
with the SCSI specification.

Fortunately, it is easy enough to support non-power-of-2 sizing,
even if it means we are less efficient than truly possible when
targetting that device (for example, it means that we refuse to
unmap anything that is not a multiple of 15M and aligned to a
15M boundary, even if the device truly does support a smaller
granularity where unmapping actually works).

Reported-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1469129688-22848-5-git-send-email-eblake@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-03 18:44:57 +02:00
Eric Blake 7423f41782 nbd: Limit nbdflags to 16 bits
Rather than asserting that nbdflags is within range, just give
it the correct type to begin with :)  nbdflags corresponds to
the per-export portion of NBD Protocol "transmission flags", which
is 16 bits in response to NBD_OPT_EXPORT_NAME and NBD_OPT_GO.

Furthermore, upstream NBD has never passed the global flags to
the kernel via ioctl(NBD_SET_FLAGS) (the ioctl was first
introduced in NBD 2.9.22; then a latent bug in NBD 3.1 actually
tried to OR the global flags with the transmission flags, with
the disaster that the addition of NBD_FLAG_NO_ZEROES in 3.9
caused all earlier NBD 3.x clients to treat every export as
read-only; NBD 3.10 and later intentionally clip things to 16
bits to pass only transmission flags).  Qemu should follow suit,
since the current two global flags (NBD_FLAG_FIXED_NEWSTYLE
and NBD_FLAG_NO_ZEROES) have no impact on the kernel's behavior
during transmission.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>

Message-Id: <1469129688-22848-3-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-03 18:44:56 +02:00
Peter Maydell df5c50a208 -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQIcBAABAgAGBQJXl81aAAoJEL2+eyfA3jBXDdQP/2nKJtdW2LEGV/2G8akHPoLc
 NXMdYekBr3IiwZFkM4sbj8zNpF9wNPxq7dccF2d590BlwdDDSB1SwOH0owIdkFeQ
 fQ2p42qDnhzvX0F/riUg85kE3iCLgzQ7RYjgmVbE6jn8rSnLXsTeCsezjF4gn26y
 pxBmfQWpoNlZ6fPK1djMP1PVrABUqaj8T4KEynTQUDH1EMjaVlXmRKx5ziAyixCB
 84tsqUcwGKFWMKtpvtAKSGS+krSEOnRRYT3QRRVDbwWFFd+ANAlclZ51dPRJmKqd
 h+KiBbCyt64mGYHLBCasqr+ae5Va4993PQIbQY/nditk713fEUT49wCBKtcPrsbC
 SRpr+QygITeyE1Zc+hoPhEdO50c0/UMGrq8+wkVqH5bIeSDDBkbJmPZeQEyG5352
 U/pW90ODIx9tA9ihSu7K+hq4h9Z/sYEIJhPwnFGjOEqDTJ65slRyIz7axmRUhDk3
 /hN23gB+6UTzsJ7o6ZElQP8DWwjb6TKfrk51K1AJI5Zofywt7OJc5uFDARS4cJKv
 +1iBO1BDQFTF0a4GkGIydDbRlSa3TtOE68qK4fdOuf8L/f3f7ZhG66E0PsYvNsc6
 MJhl2do1Futl6iyidS+KbNaYUqLDh4VekOpb64P/7ZW5SWvYMLODep1fByTS6Kqs
 13Dc/03jiFkvph9EvrId
 =l58I
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging

# gpg: Signature made Tue 26 Jul 2016 21:51:38 BST
# gpg:                using RSA key 0xBDBE7B27C0DE3057
# gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>"
# gpg:                 aka "Jeffrey Cody <jeff@codyprime.org>"
# gpg:                 aka "Jeffrey Cody <codyprime@gmail.com>"
# Primary key fingerprint: 9957 4B4D 3474 90E7 9D98  D624 BDBE 7B27 C0DE 3057

* remotes/cody/tags/block-pull-request:
  mirror: double performance of the bulk stage if the disc is full
  block/gluster: fix doc in the qapi schema and member name

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-07-27 16:31:01 +01:00
Vladimir Sementsov-Ogievskiy 0965a41e99 mirror: double performance of the bulk stage if the disc is full
Mirror can do up to 16 in-flight requests, but actually on full copy
(the whole source disk is non-zero) in-flight is always 1. This happens
as the request is not limited in size: the data occupies maximum available
capacity of s->buf.

The patch limits the size of the request to some artificial constant
(1 Mb here), which is not that big or small. This effectively enables
back parallelism in mirror code as it was designed.

The result is important: the time to migrate 10 Gb disk is reduced from
~350 sec to 170 sec.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1468516741-82174-1-git-send-email-vsementsov@virtuozzo.com
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
CC: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-26 16:23:36 -04:00
Daniel P. Berrange c7c4cf498f block: export LUKS specific data to qemu-img info
The qemu-img info command has the ability to expose format
specific metadata about volumes. Wire up this facility for
the LUKS driver to report on cipher configuration and key
slot usage.

    $ qemu-img info ~/VirtualMachines/demo.luks
    image: /home/berrange/VirtualMachines/demo.luks
    file format: luks
    virtual size: 98M (102760448 bytes)
    disk size: 100M
    encrypted: yes
    Format specific information:
        ivgen alg: plain64
        hash alg: sha1
        cipher alg: aes-128
        uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594
        cipher mode: xts
        slots:
            [0]:
                active: true
                iters: 572706
                key offset: 4096
                stripes: 4000
            [1]:
                active: false
                key offset: 135168
            [2]:
                active: false
                key offset: 266240
            [3]:
                active: false
                key offset: 397312
            [4]:
                active: false
                key offset: 528384
            [5]:
                active: false
                key offset: 659456
            [6]:
                active: false
                key offset: 790528
            [7]:
                active: false
                key offset: 921600
        payload offset: 2097152
        master key iters: 142375

One somewhat undesirable artifact is that the data fields are
printed out in (apparently) random order. This will be addressed
later by changing the way the block layer pretty-prints the
image specific data.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1469192015-16487-3-git-send-email-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-07-26 17:46:37 +02:00
Vladimir Sementsov-Ogievskiy ebf7bba090 qcow2: do not allocate extra memory
There are no needs to allocate more than one cluster, as we set
avail_out for deflate to one cluster.

Zlib docs (http://www.zlib.net/manual.html) says:
"deflate compresses as much data as possible, and stops when the input
buffer becomes empty or the output buffer becomes full."

So, deflate will not write more than avail_out to output buffer. If
there is not enough space in output buffer for compressed data (it may
be larger than input data) deflate just returns Z_OK. (if all data is
compressed and written to output buffer deflate returns Z_STREAM_END).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 1468515565-81313-1-git-send-email-vsementsov@virtuozzo.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-07-26 17:46:37 +02:00
Peter Maydell 206d0c2436 pc, pci, virtio: new features, cleanups, fixes
- interrupt remapping for intel iommus
 - a bunch of virtio cleanups
 - fixes all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXkQsqAAoJECgfDbjSjVRpanoIAJ9JVlc1aEjt9sa0cSBcs+NQ
 J7JmgU9FqFsj+4FrNTouO3AxTjHurd1UAULP1WMPD+V3JpbnHct8r6SCBLQ5EBMN
 VOjYo4DwWs1g+DqnQ9WZmbadu06XvYi/yiAKNUzWfZk0MR11D0D/S5hmarNKw0Kq
 tGHeTWjGeY4WqFLV7m+qB4+cqkAByn6um99UtUvgLL05RgIEIP2IEMKYZ+rXvAa9
 iGUvzqlO7mbq/+LbL18kaWywa4TCwbbd2eSGWaqhX4CuB62Rl33mWTXFcfaYhkyp
 Z3FgwaJ09h0lAjSVEbyAuLFMfO/BnMcsoKqwl4xc4vkn/xBCqFtgH9JcEVm3O8U=
 =ge2D
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc, pci, virtio: new features, cleanups, fixes

- interrupt remapping for intel iommus
- a bunch of virtio cleanups
- fixes all over the place

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Thu 21 Jul 2016 18:49:30 BST
# gpg:                using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream: (57 commits)
  intel_iommu: avoid unnamed fields
  virtio: Update migration docs
  virtio-gpu: Wrap in vmstate
  virtio-gpu: Use migrate_add_blocker for virgl migration blocking
  virtio-input: Wrap in vmstate
  9pfs: Wrap in vmstate
  virtio-serial: Wrap in vmstate
  virtio-net: Wrap in vmstate
  virtio-balloon: Wrap in vmstate
  virtio-rng: Wrap in vmstate
  virtio-blk: Wrap in vmstate
  virtio-scsi: Wrap in vmstate
  virtio: Migration helper function and macro
  virtio-serial: Remove old migration version support
  virtio-net: Remove old migration version support
  virtio-scsi: Replace HandleOutput typedef
  Revert "mirror: Workaround for unexpected iohandler events during completion"
  virtio-scsi: Call virtio_add_queue_aio
  virtio-blk: Call virtio_add_queue_aio
  virtio: Introduce virtio_add_queue_aio
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-07-21 20:12:37 +01:00
Fam Zheng d4a92a8420 Revert "mirror: Workaround for unexpected iohandler events during completion"
This reverts commit ab27c3b5e7.

The virtio storage device host notifiers now work with
bdrv_drained_begin/end, so we don't need this hack any more.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-21 20:44:19 +03:00
Peter Maydell 61ead113ae Pull request
v2:
  * Resolved merge conflict with block/iscsi.c [Peter]
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXj6TkAAoJEJykq7OBq3PI15oIAL24OFnMTFOAWyY2h3yfzGwe
 Gn1qzFOTHCXgbgVsolVuZnFU/SDn9WzMS9y6o7jcJykv++FdMSYUO7paPQAg9etX
 o9KOgyfUUDhskbaXEbTKxhqy7UvcyJsZYmjN+TD5tupfnrqGlmu6rkUPpSwGo++G
 u7CkAZewJc5SvsCdQRe3N10LbG4xU/j06cjCIHWDPRr+AV9qvsb9KkIOg3wkrhWN
 R83yj+yX6vhfQouHLoOlh3RYZ5HNo020JPum0jT1tGcoshKoFyu0prSRqNVVMqTY
 BepvWMhXbiq219HnFaQBtnysve9gWix4WhuCHtaAlaZDTBh945ZztsW7w7dubtA=
 =7ics
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

Pull request

v2:
 * Resolved merge conflict with block/iscsi.c [Peter]

# gpg: Signature made Wed 20 Jul 2016 17:20:52 BST
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request: (25 commits)
  raw_bsd: Convert to byte-based interface
  nbd: Convert to byte-based interface
  block: Kill .bdrv_co_discard()
  sheepdog: Switch .bdrv_co_discard() to byte-based
  raw_bsd: Switch .bdrv_co_discard() to byte-based
  qcow2: Switch .bdrv_co_discard() to byte-based
  nbd: Switch .bdrv_co_discard() to byte-based
  iscsi: Switch .bdrv_co_discard() to byte-based
  gluster: Switch .bdrv_co_discard() to byte-based
  blkreplay: Switch .bdrv_co_discard() to byte-based
  block: Add .bdrv_co_pdiscard() driver callback
  block: Convert .bdrv_aio_discard() to byte-based
  rbd: Switch rbd_start_aio() to byte-based
  raw-posix: Switch paio_submit() to byte-based
  block: Convert BB interface to byte-based discards
  block: Convert bdrv_aio_discard() to byte-based
  block: Switch BlockRequest to byte-based
  block: Convert bdrv_discard() to byte-based
  block: Convert bdrv_co_discard() to byte-based
  iscsi: Rely on block layer to break up large requests
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Conflicts:
	block/gluster.c
2016-07-21 11:00:36 +01:00
Eric Blake decaeed773 raw_bsd: Convert to byte-based interface
Since the raw format driver is just passing things through, we can
do byte-based read and write if the underlying protocol does
likewise.

There's one tricky part - if we probed the image format, we document
that we restrict operations on the initial sector.  It's easiest to
keep this guarantee by enforcing read-modify-write on sub-sector
operations (yes, this partially reverts commit ad82be2f).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1468624988-423-20-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:24:25 +01:00
Eric Blake 70c4fb2648 nbd: Convert to byte-based interface
The NBD protocol doesn't have any notion of sectors, so it is
a fairly easy conversion to use byte-based read and write.

Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468624988-423-19-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:24:25 +01:00
Eric Blake 02aefe43cb block: Kill .bdrv_co_discard()
Now that all drivers have a byte-based .bdrv_co_pdiscard(), we
no longer need to worry about the sector-based version.  We can
also relax our minimum alignment to 1 for drivers that support it.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-18-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:24:25 +01:00
Eric Blake dde4753763 sheepdog: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-17-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:24:25 +01:00
Eric Blake 5f61ad079a raw_bsd: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-16-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:24:25 +01:00
Eric Blake 82e8a7888b qcow2: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-15-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:24:25 +01:00
Eric Blake 447e57c3b0 nbd: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.

While at it, call directly into nbd-client.c instead of having
a pointless trivial wrapper in nbd.c.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-14-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:24:25 +01:00
Eric Blake 97c7e85cfe iscsi: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.

Unlike write_zeroes, where we can be handed unaligned requests
and must fail gracefully with -ENOTSUP for a fallback, we are
guaranteed that discard requests are always aligned because the
block layer already ignored unaligned head/tail.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-13-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:24:25 +01:00
Eric Blake 1014170b82 gluster: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-12-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake aba76e2f03 blkreplay: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-11-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake 47a5486d59 block: Add .bdrv_co_pdiscard() driver callback
There's enough drivers with a sector-based callback that it will
be easier to switch one at a time.  This patch adds a byte-based
callback, and then after all drivers are swapped, we'll drop the
sector-based callback.

[checkpatch doesn't like the space after coroutine_fn in
block_int.h, but it's consistent with the rest of the file]

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-10-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake 4da444a0bb block: Convert .bdrv_aio_discard() to byte-based
Another step towards byte-based interfaces everywhere.  Replace
the sector-based driver callback .bdrv_aio_discard() with a new
byte-based .bdrv_aio_pdiscard().  Only raw-posix and RBD drivers
are affected, so it was not worth splitting into multiple patches.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-9-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake 7bbca9e290 rbd: Switch rbd_start_aio() to byte-based
The internal function converts to byte-based before calling into
RBD code; hoist the conversion to the callers so that callers
can then be switched to byte-based themselves.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-8-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake 36e3b2e733 raw-posix: Switch paio_submit() to byte-based
The only remaining uses of paio_submit() were flush (with no
offset or count) and discard (which we are switching to byte-based);
furthermore, the similarly named paio_submit_co() is already
byte-based.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-7-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake 1c6c4bb7f0 block: Convert BB interface to byte-based discards
Change sector-based blk_discard(), blk_co_discard(), and
blk_aio_discard() to instead be byte-based blk_pdiscard(),
blk_co_pdiscard(), and blk_aio_pdiscard().  NBD gets a lot
simpler now that ignoring the unaligned portion of a
byte-based discard request is handled under the hood by
the block layer.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-6-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake 60ebac16bc block: Convert bdrv_aio_discard() to byte-based
Another step towards byte-based interfaces everywhere.  Replace
the sector-based bdrv_aio_discard() with a new byte-based
bdrv_aio_pdiscard(), which silently ignores any unaligned head
or tail.  Driver callbacks will be converted in followup patches.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1468624988-423-5-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake b15404e027 block: Switch BlockRequest to byte-based
BlockRequest is the internal struct used by bdrv_aio_*.  At the
moment, all such calls were sector-based, but we will eventually
convert to byte-based; start by changing the internal variables
to be byte-based.  No change to behavior, although the read and
write code can now go byte-based through more of the stack.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1468624988-423-4-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake 0c51a893b6 block: Convert bdrv_discard() to byte-based
Another step towards byte-based interfaces everywhere.  Replace
the sector-based bdrv_discard() with a new byte-based
bdrv_pdiscard(), which silently ignores any unaligned head
or tail.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-3-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake 9f1963b3f7 block: Convert bdrv_co_discard() to byte-based
Another step towards byte-based interfaces everywhere.  Replace
the sector-based bdrv_co_discard() with a new byte-based
bdrv_co_pdiscard(), which silently ignores any unaligned head
or tail.  Driver callbacks will be converted in followup patches.

By calculating the alignment outside of the loop, and clamping
the max discard to an aligned value, we can simplify the actions
done within the loop.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-2-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:54 +01:00
Eric Blake 6bd01f14db iscsi: Rely on block layer to break up large requests
Now that the block layer honors max_request, we don't need to
bother with an EINVAL on overlarge requests, but can instead
assert that requests are well-behaved.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468607524-19021-7-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:54 +01:00
Eric Blake 1e2a77a851 nbd: Drop unused offset parameter
Now that NBD relies on the block layer to fragment things, we no
longer need to track an offset argument for which fragment of
a request we are actually servicing.

While at it, use true and false instead of 0 and 1 for a bool
parameter.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468607524-19021-6-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:54 +01:00
Eric Blake fb1a6de14a nbd: Rely on block layer to break up large requests
Now that the block layer will honor max_transfer, we can simplify
our code to rely on that guarantee.

The readv code can call directly into nbd-client, just as the
writev code has done since commit 52a4650.

Interestingly enough, while qemu-io 'w 0 40m' splits into a 32M
and 8M transaction, 'w -z 0 40m' splits into two 16M and an 8M,
because the block layer caps the bounce buffer for writing zeroes
at 16M.  When we later introduce support for NBD_CMD_WRITE_ZEROES,
we can get a full 32M zero write (or larger, if the client and
server negotiate that write zeroes can use a larger size than
ordinary writes).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468607524-19021-5-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:54 +01:00
Eric Blake 04ed95f484 block: Fragment writes to max transfer length
Drivers should be able to rely on the block layer honoring the
max transfer length, rather than needing to return -EINVAL
(iscsi) or manually fragment things (nbd).  We already fragment
write zeroes at the block layer; this patch adds the fragmentation
for normal writes, after requests have been aligned (fragmenting
before alignment would lead to multiple unaligned requests, rather
than just the head and tail).

When fragmenting a large request where FUA was requested, but
where we know that FUA is implemented by flushing all requests
rather than the given request, then we can still get by with
only one flush.  Note, however, that we need a followup patch
to the raw format driver to avoid a regression in the number of
flushes actually issued.

The return value was previously nebulous on success (sometimes
zero, sometimes the length written); since we never have a short
write, and since fragmenting may store yet another positive
value in 'ret', change the function to always return 0 on success,
matching what we do in bdrv_aligned_preadv().

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1468607524-19021-4-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:54 +01:00
Eric Blake 8a39b4d6e2 raw_bsd: Don't advertise flags not supported by protocol layer
The raw format layer supports all flags via passthrough - but
it only makes sense to pass through flags that the lower layer
actually supports.

The next patch gives stronger reasoning for why this is correct.
At the moment, the raw format layer ignores the max_transfer
limit of its protocol layer, and an attempt to do the qemu-io
'w -f 0 40m' to an NBD server that lacks FUA will pass the entire
40m request to the NBD driver, which then fragments the request
itself into a 32m write, 8m write, and flush.  But once the block
layer starts honoring limits and fragmenting packets, the raw
driver will hand the NBD driver two separate requests; if both
requests have BDRV_REQ_FUA set, then this would result in a 32m
write, flush, 8m write, and second flush.  By having the raw
layer no longer advertise FUA support when the protocol layer
lacks it, we are back to a single flush at the block layer for
the overall 40m request.

Note that 'w -f -z 0 40m' does not currently exhibit the same
problem, because there, the fragmentation does not occur until
at the NBD layer (the raw layer has .bdrv_co_pwrite_zeroes, and
the NBD layer doesn't advertise max_pwrite_zeroes to constrain
things at the raw layer) - but the problem is latent and we
would again have too many flushes without this patch once the
NBD layer implements support for the new NBD_CMD_WRITE_ZEROES
command, if it sets max_pwrite_zeroes to the same 32m limit as
recommended by the NBD protocol.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468607524-19021-3-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:54 +01:00
Eric Blake 1a62d0accd block: Fragment reads to max transfer length
Drivers should be able to rely on the block layer honoring the
max transfer length, rather than needing to return -EINVAL
(iscsi) or manually fragment things (nbd).  This patch adds
the fragmentation in the block layer, after requests have been
aligned (fragmenting before alignment would lead to multiple
unaligned requests, rather than just the head and tail).

The return value was previously nebulous on success on whether
it was zero or the length read; and fragmenting may introduce
yet other non-zero values if we use the last length read.  But
as at least some callers are sloppy and expect only zero on
success, it is easiest to just guarantee 0.

[Fix uninitialized ret local variable in bdrv_aligned_preadv().
--Stefan]

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1468607524-19021-2-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:54 +01:00
Prasanna Kumar Kalever 6c7189bb29 block/gluster: add support for multiple gluster servers
This patch adds a way to specify multiple volfile servers to the gluster
block backend of QEMU with tcp|rdma transport types and their port numbers.

Problem:

Currently VM Image on gluster volume is specified like this:

file=gluster[+tcp]://host[:port]/testvol/a.img

Say we have three hosts in a trusted pool with replica 3 volume in action.
When the host mentioned in the command above goes down for some reason,
the other two hosts are still available. But there's currently no way
to tell QEMU about them.

Solution:

New way of specifying VM Image on gluster volume with volfile servers:
(We still support old syntax to maintain backward compatibility)

Basic command line syntax looks like:

Pattern I:
 -drive driver=gluster,
        volume=testvol,path=/path/a.raw,[debug=N,]
        server.0.type=tcp,
        server.0.host=1.2.3.4,
        server.0.port=24007,
        server.1.type=unix,
        server.1.socket=/path/socketfile

Pattern II:
 'json:{"driver":"qcow2","file":{"driver":"gluster",
       "volume":"testvol","path":"/path/a.qcow2",["debug":N,]
       "server":[{hostinfo_1}, ...{hostinfo_N}]}}'

   driver      => 'gluster' (protocol name)
   volume      => name of gluster volume where our VM image resides
   path        => absolute path of image in gluster volume
  [debug]      => libgfapi loglevel [(0 - 9) default 4 -> Error]

  {hostinfo}   => {{type:"tcp",host:"1.2.3.4"[,port=24007]},
                   {type:"unix",socket:"/path/sockfile"}}

   type        => transport type used to connect to gluster management daemon,
                  it can be tcp|unix
   host        => host address (hostname/ipv4/ipv6 addresses/socket path)
   port        => port number on which glusterd is listening.
   socket      => path to socket file

Examples:
1.
 -drive driver=qcow2,file.driver=gluster,
        file.volume=testvol,file.path=/path/a.qcow2,file.debug=9,
        file.server.0.type=tcp,
        file.server.0.host=1.2.3.4,
        file.server.0.port=24007,
        file.server.1.type=unix,
        file.server.1.socket=/var/run/glusterd.socket
2.
  'json:{"driver":"qcow2","file":{"driver":"gluster","volume":"testvol",
         "path":"/path/a.qcow2","debug":9,"server":
         [{"type":"tcp","host":"1.2.3.4","port":"24007"},
          {"type":"unix","socket":"/var/run/glusterd.socket"}
         ]}}'

This patch gives a mechanism to provide all the server addresses, which are in
replica set, so in case host1 is down VM can still boot from any of the
active hosts.

This is equivalent to the backup-volfile-servers option supported by
mount.glusterfs (FUSE way of mounting gluster volume)

credits: sincere thanks to all the supporters

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1468947453-5433-6-git-send-email-prasanna.kalever@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 17:38:50 -04:00
Prasanna Kumar Kalever 7edac2ddeb block/gluster: using new qapi schema
this patch adds 'GlusterServer' related schema in qapi/block-core.json

[Jeff: minor fix-ups of comments and formatting, per patch reviews]

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1468947453-5433-5-git-send-email-prasanna.kalever@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 17:36:11 -04:00
Prasanna Kumar Kalever 0552ff2465 block/gluster: deprecate rdma support
gluster volfile server fetch happens through unix and/or tcp, it doesn't
support volfile fetch over rdma. The rdma code may actually mislead,
so to make sure things do not break, for now we fallback to tcp when requested
for rdma, with a warning.

If you are wondering how this worked all these days, its the gluster libgfapi
code which handles anything other than unix transport as socket/tcp, sad but
true.

Also gluster doesn't support ipv6 addresses, removing the ipv6 related
comments/docs section

[Jeff: Minor grammatical fixes in comments and commit message, per
review comments]

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1468947453-5433-4-git-send-email-prasanna.kalever@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 17:20:44 -04:00
Prasanna Kumar Kalever f70c50c817 block/gluster: code cleanup
unified coding styles of multiline function arguments and other error functions
moved random declarations of structures and other list variables

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1468947453-5433-3-git-send-email-prasanna.kalever@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 17:08:12 -04:00
Prasanna Kumar Kalever d5cf4079ca block/gluster: rename [server, volname, image] -> [host, volume, path]
A future patch will add support for multiple gluster servers. Existing
terminology is a bit unusual in relation to what names are used by
other networked devices, and doesn't map very well to the terminology
we expect to use for multiple servers.  Therefore, rename the following
options:
'server'  -> 'host'
'image'   -> 'path'
'volname' -> 'volume'

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1468947453-5433-2-git-send-email-prasanna.kalever@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 17:08:12 -04:00
Denis V. Lunev cf56a3c632 mirror: fix request throttling in drive-mirror
There are 2 deficiencies here:
- mirror_iteration could start several requests inside. Thus we could
  simply have more in_flight requests than MAX_IN_FLIGHT.
- keeping this in mind throttling in mirror_run which is checking
  s->in_flight == MAX_IN_FLIGHT is wrong.

The patch adds the check and throttling into mirror_iteration and fixes
the check in mirror_run() to be sure.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1466598927-5990-1-git-send-email-den@openvz.org
CC: Jeff Cody <jcody@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
(cherry picked from commit e648dc95c28fbca12e67be26a1fc4b9a0676c3fe)
2016-07-19 17:03:44 -04:00
Denis V. Lunev 4b5004d9fc mirror: improve performance of mirroring of empty disk
We should not take into account zero blocks for delay calculations.
They are not read and thus IO throttling is not required. In the
other case VM migration with 16 Tb QCOW2 disk with 4 Gb of data takes
days.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-9-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
CC: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 17:02:49 -04:00
Denis V. Lunev c7c2769c0e mirror: efficiently zero out target
With a bdrv_co_write_zeroes method on a target BDS and when this method
is working as indicated by the bdrv_can_write_zeroes_with_unmap(), zeroes
will not be placed into the wire. Thus the target could be very efficiently
zeroed out. This should be done with the largest chunk possible.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-8-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
CC: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 16:54:46 -04:00
Denis V. Lunev b7d5062c9c mirror: optimize dirty bitmap filling in mirror_run a bit
There is no need to scan allocation tables if we have mark_all_dirty flag
set. Just mark it all dirty.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-7-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 16:54:46 -04:00
Denis V. Lunev c0b363ad43 mirror: create mirror_dirty_init helper for mirror_run
The code inside the helper will be extended in the next patch. mirror_run
itself is overbloated at the moment.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-5-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
CC: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 16:54:46 -04:00
Denis V. Lunev 49efb1f5b0 mirror: create mirror_throttle helper
The patch also places last_pause_ns from stack in mirror_run into
MirrorBlockJob structure. This helper will be useful in next patches.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1468503209-19498-4-git-send-email-den@openvz.org
CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Eric Blake <eblake@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 16:54:46 -04:00
Denis V. Lunev 531509ba28 mirror: make sectors_in_flight int64_t
We keep here the sum of int fields. Thus this could easily overflow,
especially when we will start sending big requests in next patches.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-3-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 16:54:46 -04:00
Denis V. Lunev 6d07859926 dirty-bitmap: operate with int64_t amount
Underlying HBitmap operates even with uint64_t. Thus this change is safe.
This would be useful f.e. to mark entire bitmap dirty in one call.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-2-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 16:54:46 -04:00
Peter Maydell a3b3437721 * two old patches from prospective GSoC students
* i386 -kernel device tree support
 * Coverity fix
 * memory usage improvement from Peter
 * checkpatch fix
 * g_path_get_dirname cleanup
 * caching of block status for iSCSI
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJXjcwdAAoJEL/70l94x66D0s8IAKyEBxtATSXZG90jJR+uoCbv
 oK6ea9RDWUZxa2hKjcXh9Be+g4pTv99BqTKJJt+uwkFHAAJl0gvVty+EHE/2sfyo
 Nlt9FlWibxBdSoHxeCq4jg9APWORxcSx3rspg0I8TdxbweKdm9onvXEjfvmhucqG
 FfPSIHg5vsoutCPEDTXfaJDFiLw+rV7Em53kxD/y4VhHZlAWhahpCHSL/lWGRoLp
 B0mKvoHhqHR/EqJr7y7fuga+Aoimyh8R6dUfpxuXQely3309V7znhq0erPQWvSwX
 JKRITmGbWW3HOjplqBT971eH5iH0bDEryx91Oas9VNpm9eGr6qygePhc1eMszxE=
 =pBxf
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* two old patches from prospective GSoC students
* i386 -kernel device tree support
* Coverity fix
* memory usage improvement from Peter
* checkpatch fix
* g_path_get_dirname cleanup
* caching of block status for iSCSI

# gpg: Signature made Tue 19 Jul 2016 07:43:41 BST
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  target-i386: Remove redundant HF_SOFTMMU_MASK
  block/iscsi: allow caching of the allocation map
  block/iscsi: fix rounding in iscsi_allocationmap_set
  Move README to markdown
  cpu-exec: Move down some declarations in cpu_exec()
  exec: avoid realloc in phys_map_node_reserve
  checkpatch: consider git extended headers valid patches
  megasas: remove useless check for cmd->frame
  compiler: never omit assertions if using a static analysis tool
  hw/i386: add device tree support
  Changed malloc to g_malloc, free to g_free in bsd-user/qemu.h
  use g_path_get_dirname instead of dirname

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-07-19 15:08:05 +01:00
Peter Maydell ad31cd4c69 -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQIcBAABAgAGBQJXjV3bAAoJEH3vgQaq/DkOArAQAKc03QhwukqT2aZ5zs7kZ1Hv
 PCHrISL0dKGK3YfyiSitppoxr6eBoBR9UIEjaNlW0XwujqwWdWfJ7kIbVSGAyWqR
 YWmV+bA+TUWXz+tFbeDI0maMH9GNVCAuvQiqldmJxhZ303pf5cksZ9CqiALSylTY
 t5XUB6nhV7MPto63C2X2xjLkKxlsT9KOTsYxGVgVXwUzgW1lAuu8Lo6eNULXCgUa
 j+azgSFAiUOKwfKxcKD25kPOxgWlrxkGRc2LdFlopEzShENROhR2r9ut3okgAoM4
 KPVIE3jSsLMhNr9bRQRUJw53vRSL/bxFvlCdzKiBFSo7wNMKWNA9RWF6+If1Jvoi
 Am+BzINCfNfoFmqlXppqWGlapk9ZtmDGbPwaUyT6NJ9axAASTQxcj8QOjGEX07UE
 ubvzIXx7D1Amo59/4RWRXVDpMb9+p3npqkuCL+DWZzq7EVB42ig8+fKhijhS4jUK
 2DA7uL4orUjUIoTbJZsKciw7MfaWuP2/SnP1VRNRXSsiNg5N4qJDUB6Wo5AKksQB
 LWP4Ou4irPj/ZGvMhJBMMOQ5kl2maqj8beP9pVjltVjGVzAihAyuXHUqjPdqe1R9
 PLQidHKCb1OaJWi3c53vayuzlINH9iY+adjgSuYxi1QOZ+uqEsjSs+3+m1gyTOeh
 sd8VU/TtJbiGBCh9VTTf
 =9N8+
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into staging

# gpg: Signature made Mon 18 Jul 2016 23:53:15 BST
# gpg:                using RSA key 0x7DEF8106AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>"
# Primary key fingerprint: FAEB 9711 A12C F475 812F  18F2 88A9 064D 1835 61EB
#      Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76  CBD0 7DEF 8106 AAFC 390E

* remotes/jnsnow/tags/ide-pull-request:
  block: ignore flush requests when storage is clean
  tests: in IDE and AHCI tests perform DMA write before flushing
  ide: set retry_unit for PIO and FLUSH requests
  ide: refactor retry_unit set and clear into separate function

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-07-19 11:47:07 +01:00
Peter Lieven e1123a3b40 block/iscsi: allow caching of the allocation map
until now the allocation map was used only as a hint if a cluster
is allocated or not. If a block was not allocated (or Qemu had
no info about the allocation status) a get_block_status call was
issued to check the allocation status and possibly avoid
a subsequent read of unallocated sectors. If a block known to be
allocated the get_block_status call was omitted. In the other case
a get_block_status call was issued before every read to avoid
the necessity for a consistent allocation map. To avoid the
potential overhead of calling get_block_status for each and
every read request this took only place for the bigger requests.

This patch enhances this mechanism to cache the allocation
status and avoid calling get_block_status for blocks where
the allocation status has been queried before. This allows
for bypassing the read request even for smaller requests and
additionally omits calling get_block_status for known to be
unallocated blocks.

Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1468831940-15556-3-git-send-email-pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-19 08:34:53 +02:00
Peter Lieven eb36b953e0 block/iscsi: fix rounding in iscsi_allocationmap_set
when setting clusters as alloacted the boundaries have
to be expanded. As Paolo pointed out the calculation of
the number of clusters is wrong:

Suppose cluster_sectors is 2, sector_num = 1, nb_sectors = 6:

In the "mark allocated" case, you want to set 0..8, i.e.
cluster_num=0, nb_clusters=4.

   0--.--2--.--4--.--6--.--8
   <--|_________________|-->  (<--> = expanded)

Instead you are setting nb_clusters=3, so that 6..8 is not marked.

   0--.--2--.--4--.--6--.--8
   <--|______________|!!!     (! = wrong)

Cc: qemu-stable@nongnu.org
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1468831940-15556-2-git-send-email-pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-19 08:34:53 +02:00
Evgeny Yakovlev 3ff2f67a7c block: ignore flush requests when storage is clean
Some guests (win2008 server for example) do a lot of unnecessary
flushing when underlying media has not changed. This adds additional
overhead on host when calling fsync/fdatasync.

This change introduces a write generation scheme in BlockDriverState.
Current write generation is checked against last flushed generation to
avoid unnessesary flushes.

The problem with excessive flushing was found by a performance test
which does parallel directory tree creation (from 2 processes).
Results improved from 0.424 loops/sec to 0.432 loops/sec.
Each loop creates 10^3 directories with 10 files in each.

This affected some blkdebug testcases that were expecting error logs from
failure-injected flushes which are now skipped entirely
(tests 026 071 089).

This also affects the performance of block jobs and thus BLOCK_JOB_READY
events for driver-mirror and active block-commit commands now arrives
faster, before QMP send successfully returns to caller (tests 141 144).

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-5-git-send-email-den@openvz.org
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
2016-07-18 18:19:01 -04:00
Roman Pen 5e1b34a3fa linux-aio: prevent submitting more than MAX_EVENTS
Invoking io_setup(MAX_EVENTS) we ask kernel to create ring buffer for us
with specified number of events.  But kernel ring buffer allocation logic
is a bit tricky (ring buffer is page size aligned + some percpu allocation
are required) so eventually more than requested events number is allocated.

From a userspace side we have to follow the convention and should not try
to io_submit() more or logic, which consumes completed events, should be
changed accordingly.  The pitfall is in the following sequence:

    MAX_EVENTS = 128
    io_setup(MAX_EVENTS)

    io_submit(MAX_EVENTS)
    io_submit(MAX_EVENTS)

    /* now 256 events are in-flight */

    io_getevents(MAX_EVENTS) = 128

    /* we can handle only 128 events at once, to be sure
     * that nothing is pended the io_getevents(MAX_EVENTS)
     * call must be invoked once more or hang will happen. */

To prevent the hang or reiteration of io_getevents() call this patch
restricts the number of in-flights, which is now limited to MAX_EVENTS.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468415004-31755-1-git-send-email-roman.penyaev@profitbricks.com
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: qemu-devel@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18 15:10:52 +01:00
Paolo Bonzini 0187f5c9cb linux-aio: share one LinuxAioState within an AioContext
This has better performance because it executes fewer system calls
and does not use a bottom half per disk.

Originally proposed by Ming Lei.

[Changed #include "raw-aio.h" to "block/raw-aio.h" in win32-aio.c to fix
build error as reported by Peter Maydell <peter.maydell@linaro.org>.
--Stefan]

Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1467650000-51385-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

squash! linux-aio: share one LinuxAioState within an AioContext
2016-07-18 15:09:31 +01:00
Max Reitz c4b48bfdc5 vvfat: Fix qcow write target driver specification
First, bdrv_open_child() expects all options for the child to be
prefixed by the child's name (and a separating dot). Second,
bdrv_open_child() does not take ownership of the QDict passed to it but
only extracts all options for the child, so if a QDict is created for
the sole purpose of passing it to bdrv_open_child(), it needs to be
freed afterwards.

This patch makes vvfat adhere to both of these rules.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160711135452.11304-1-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-07-13 13:41:39 +02:00
Reda Sallahi 524089bce4 vmdk: fix metadata write regression
Commit "cdeaf1f vmdk: add bdrv_co_write_zeroes" causes a regression on
writes. It writes metadata after every write instead of doing it only once
for each cluster.

vmdk_pwritev() writes metadata whenever m_data is set as valid so this patch
sets m_data as valid only when we have a new cluster which hasn't been
allocated before or a zero grain.

Signed-off-by: Reda Sallahi <fullmanet@gmail.com>
Message-id: 20160707084249.29084-1-fullmanet@gmail.com
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-07-13 13:41:39 +02:00
Sascha Silbe f14a39ccb9 Improve block job rate limiting for small bandwidth values
ratelimit_calculate_delay() previously reset the accounting every time
slice, no matter how much data had been processed before. This had (at
least) two consequences:

1. The minimum speed is rather large, e.g. 5 MiB/s for commit and stream.

   Not sure if there are real-world use cases where this would be a
   problem. Mirroring and backup over a slow link (e.g. DSL) would
   come to mind, though.

2. Tests for block job operations (e.g. cancel) were rather racy

   All block jobs currently use a time slice of 100ms. That's a
   reasonable value to get smooth output during regular
   operation. However this also meant that the state of block jobs
   changed every 100ms, no matter how low the configured limit was. On
   busy hosts, qemu often transferred additional chunks until the test
   case had a chance to cancel the job.

Fix the block job rate limit code to delay for more than one time
slice to address the above issues. To make it easier to handle
oversized chunks we switch the semantics from returning a delay
_before_ the current request to a delay _after_ the current
request. If necessary, this delay consists of multiple time slice
units.

Since the mirror job sends multiple chunks in one go even if the rate
limit was exceeded in between, we need to keep track of the start of
the current time slice so we can correctly re-compute the delay for
the updated amount of data.

The minimum bandwidth now is 1 data unit per time slice. The block
jobs are currently passing the amount of data transferred in sectors
and using 100ms time slices, so this translates to 5120
bytes/second. With chunk sizes usually being O(512KiB), tests have
plenty of time (O(100s)) to operate on block jobs. The chance of a
race condition now is fairly remote, except possibly on insanely
loaded systems.

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Message-id: 1467127721-9564-2-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-07-13 13:41:38 +02:00
Max Reitz c834cba905 qcow2: Fix qcow2_get_cluster_offset()
Recently, qcow2_get_cluster_offset() has been changed to work with bytes
instead of sectors. This invalidated some assertions and introduced a
possible integer multiplication overflow.

This could be reproduced using e.g.

$ qemu-img create -f qcow2 -o cluster_size=1M blub.qcow2 8G
Formatting 'foo.qcow2', fmt=qcow2 size=8589934592 encryption=off
cluster_size=1048576 lazy_refcounts=off refcount_bits=16
$ qemu-io -c map blub.qcow2
qemu-io: qemu/block/qcow2-cluster.c:504: qcow2_get_cluster_offset:
Assertion `bytes_needed <= INT_MAX' failed.
[1]    20775 abort (core dumped)  qemu-io -c map foo.qcow2

This patch removes the now wrong assertion, adding comments and more
assertions to prove its correctness (and fixing the overflow which would
become apparent with the original assertion removed).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160620142623.24471-3-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-07-13 13:41:38 +02:00
Max Reitz 84c26520d3 qcow2: Avoid making the L1 table too big
We refuse to open images whose L1 table we deem "too big". Consequently,
we should not produce such images ourselves.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160615153630.2116-3-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
[mreitz: Added QEMU_BUILD_BUG_ON()]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-07-13 13:41:38 +02:00
Kevin Wolf 8c39825218 block/qdev: Allow configuring rerror/werror with qdev properties
The rerror/werror policies are implemented in the devices, so that's
where they should be configured. In comparison to the old options in
-drive, the qdev properties are only added to those devices that
actually support them.

If the option isn't given (or "auto" is specified), the setting of the
BlockBackend is used for compatibility with the old options. For block
jobs, "auto" is the same as "enospc".

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-07-13 13:32:27 +02:00
Kevin Wolf 1e8fb7f1ee commit: Fix use of error handling policy
Commit implemented the 'enospc' policy as 'ignore' if the error was not
ENOSPC. The QAPI documentation promises that it's treated as 'stop'.
Using the common block job error handling function fixes this and also
adds the missing QMP event.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-07-13 13:32:27 +02:00
Paolo Bonzini 0b8b8753e4 coroutine: move entry argument to qemu_coroutine_create
In practice the entry argument is always known at creation time, and
it is confusing that sometimes qemu_coroutine_enter is used with a
non-NULL argument to re-enter a coroutine (this happens in
block/sheepdog.c and tests/test-coroutine.c).  So pass the opaque value
at creation time, for consistency with e.g. aio_bh_new.

Mostly done with the following semantic patch:

@ entry1 @
expression entry, arg, co;
@@
- co = qemu_coroutine_create(entry);
+ co = qemu_coroutine_create(entry, arg);
  ...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);

@ entry2 @
expression entry, arg;
identifier co;
@@
- Coroutine *co = qemu_coroutine_create(entry);
+ Coroutine *co = qemu_coroutine_create(entry, arg);
  ...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);

@ entry3 @
expression entry, arg;
@@
- qemu_coroutine_enter(qemu_coroutine_create(entry), arg);
+ qemu_coroutine_enter(qemu_coroutine_create(entry, arg));

@ reentry @
expression co;
@@
- qemu_coroutine_enter(co, NULL);
+ qemu_coroutine_enter(co);

except for the aforementioned few places where the semantic patch
stumbled (as expected) and for test_co_queue, which would otherwise
produce an uninitialized variable warning.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13 13:26:02 +02:00
Fam Zheng 5af7045bd0 raw-posix: Use qemu_dup
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13 13:26:02 +02:00
Alberto Garcia fd62c609ed commit: Add 'job-id' parameter to 'block-commit'
This patch adds a new optional 'job-id' parameter to 'block-commit',
allowing the user to specify the ID of the block job to be created.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13 13:26:02 +02:00
Alberto Garcia 2323322ed0 stream: Add 'job-id' parameter to 'block-stream'
This patch adds a new optional 'job-id' parameter to 'block-stream',
allowing the user to specify the ID of the block job to be created.

The HMP 'block_stream' command remains unchanged.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13 13:26:02 +02:00
Alberto Garcia 70559d499c backup: Add 'job-id' parameter to 'blockdev-backup' and 'drive-backup'
This patch adds a new optional 'job-id' parameter to 'blockdev-backup'
and 'drive-backup', allowing the user to specify the ID of the block
job to be created.

The HMP 'drive_backup' command remains unchanged.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13 13:26:02 +02:00
Alberto Garcia 71aa98678c mirror: Add 'job-id' parameter to 'blockdev-mirror' and 'drive-mirror'
This patch adds a new optional 'job-id' parameter to 'blockdev-mirror'
and 'drive-mirror', allowing the user to specify the ID of the block
job to be created.

The HMP 'drive_mirror' command remains unchanged.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13 13:26:02 +02:00
Alberto Garcia 7f0317cfc8 blockjob: Add 'job_id' parameter to block_job_create()
When a new job is created, the job ID is taken from the device name of
the BDS. This patch adds a new 'job_id' parameter to let the caller
provide one instead.

This patch also verifies that the ID is always unique and well-formed.
This causes problems in a couple of places where no ID is being set,
because the BDS does not have a device name.

In the case of test_block_job_start() (from test-blockjob-txn.c) we
can simply use this new 'job_id' parameter to set the missing ID.

In the case of img_commit() (from qemu-img.c) we still don't have the
API to make commit_active_start() set the job ID, so we solve it by
setting a default value. We'll get rid of this as soon as we extend
the API.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13 13:26:02 +02:00
Alberto Garcia 9df229c3ca blockjob: Update description of the 'id' field
The 'id' field of the BlockJob structure will be able to hold any ID,
not only a device name. This patch updates the description of that
field and the error messages where it is being used.

Soon we'll add the ability to set an arbitrary ID when creating a
block job.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13 13:26:02 +02:00
Markus Armbruster a9c94277f0 Use #include "..." for our own headers, <...> for others
Tracked down with an ugly, brittle and probably buggy Perl script.

Also move includes converted to <...> up so they get included before
ours where that's obviously okay.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-07-12 16:19:16 +02:00
Peter Maydell 975b1c3ac6 QAPI patches for 2016-07-06
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXfMjDAAoJEDhwtADrkYZTyKwP/i5Lva3qhDk9AJhvPNSbnps/
 sxVGVkUTeUM0UoXdKmDChkk2zScv2AOVxL3oibb22PmU1MvDU+XoKYokae/VucVS
 WL16t/Z77QxSJ1yKhXT3i0P7+9sR9Gq2eDHchSSITLreVO8/jNb7u1JjnV4nPyQP
 Scg6sCCnfgSw8W4EOz0wDhwXHbRv7obSY0lGUD1R6FifETHQ98rTpII0BS1AyvbF
 HIJMcGpirAe62BGNQeJv933mrD1EgRfBytQAkEL/l6DDlz/xk9AE26bsjdP5Xa9A
 MwEBje9CPm0ICleqq1ZVKNQQqng2PVfXXXdJYKQKVnIHL+REzRDKbTGTlS4njdVi
 v2i4E7MwDPIf86zRUXJ6eUkWDHgld4of2WZmoxNPZmWWmHMb31iqGcb0//mYmP06
 QOPd/uxpxg9pze4JeQMhzHCbFum2/s43cyqaH24lrwhTw3EsI45t4sgJQTG1NIFt
 ZdPXVwNBfa6ur50dsRWHYonQimpofW94YYupWLggiVGKTzvfSWVQJGc9Ho65P23y
 ZJrkbN3bWNnwVFzIbWFI5TKnE/TsnWvTPwk+9cGSP4MKf1sy6t5nIycoTiP3vC9x
 PesENYYKy5ZkOkKQpC6lGwjkMl24ii3lVl5g4vWXXjtZHwGNmq/7/UJ7Dm4htqIk
 B5gm5gQ/2Tul4Dihlteb
 =BgpV
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2016-07-06' into staging

QAPI patches for 2016-07-06

# gpg: Signature made Wed 06 Jul 2016 10:00:51 BST
# gpg:                using RSA key 0x3870B400EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-qapi-2016-07-06:
  replay: Use new QAPI cloning
  sockets: Use new QAPI cloning
  qapi: Add new clone visitor
  qapi: Add new visit_complete() function
  tests: Factor out common code in qapi output tests
  tests: Clean up test-string-output-visitor
  qmp-output-visitor: Favor new visit_free() function
  string-output-visitor: Favor new visit_free() function
  qmp-input-visitor: Favor new visit_free() function
  string-input-visitor: Favor new visit_free() function
  opts-visitor: Favor new visit_free() function
  qapi: Add new visit_free() function
  qapi: Add parameter to visit_end_*
  qemu-img: Don't leak errors when outputting JSON
  qapi: Improve use of qmp/types.h

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-07-06 11:38:09 +01:00
Eric Blake 3b098d5697 qapi: Add new visit_complete() function
Making each output visitor provide its own output collection
function was the only remaining reason for exposing visitor
sub-types to the rest of the code base.  Add a polymorphic
visit_complete() function which is a no-op for input visitors,
and which populates an opaque pointer for output visitors.  For
maximum type-safety, also add a parameter to the output visitor
constructors with a type-correct version of the output pointer,
and assert that the two uses match.

This approach was considered superior to either passing the
output parameter only during construction (action at a distance
during visit_free() feels awkward) or only during visit_complete()
(defeating type safety makes it easier to use incorrectly).

Most callers were function-local, and therefore a mechanical
conversion; the testsuite was a bit trickier, but the previous
cleanup patch minimized the churn here.

The visit_complete() function may be called at most once; doing
so lets us use transfer semantics rather than duplication or
ref-count semantics to get the just-built output back to the
caller, even though it means our behavior is not idempotent.

Generated code is simplified as follows for events:

|@@ -26,7 +26,7 @@ void qapi_event_send_acpi_device_ost(ACP
|     QDict *qmp;
|     Error *err = NULL;
|     QMPEventFuncEmit emit;
|-    QmpOutputVisitor *qov;
|+    QObject *obj;
|     Visitor *v;
|     q_obj_ACPI_DEVICE_OST_arg param = {
|         info
|@@ -39,8 +39,7 @@ void qapi_event_send_acpi_device_ost(ACP
|
|     qmp = qmp_event_build_dict("ACPI_DEVICE_OST");
|
|-    qov = qmp_output_visitor_new();
|-    v = qmp_output_get_visitor(qov);
|+    v = qmp_output_visitor_new(&obj);
|
|     visit_start_struct(v, "ACPI_DEVICE_OST", NULL, 0, &err);
|     if (err) {
|@@ -55,7 +54,8 @@ void qapi_event_send_acpi_device_ost(ACP
|         goto out;
|     }
|
|-    qdict_put_obj(qmp, "data", qmp_output_get_qobject(qov));
|+    visit_complete(v, &obj);
|+    qdict_put_obj(qmp, "data", obj);
|     emit(QAPI_EVENT_ACPI_DEVICE_OST, qmp, &err);

and for commands:

| {
|     Error *err = NULL;
|-    QmpOutputVisitor *qov = qmp_output_visitor_new();
|     Visitor *v;
|
|-    v = qmp_output_get_visitor(qov);
|+    v = qmp_output_visitor_new(ret_out);
|     visit_type_AddfdInfo(v, "unused", &ret_in, &err);
|-    if (err) {
|-        goto out;
|+    if (!err) {
|+        visit_complete(v, ret_out);
|     }
|-    *ret_out = qmp_output_get_qobject(qov);
|-
|-out:
|     error_propagate(errp, err);

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-13-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-07-06 10:52:04 +02:00
Eric Blake 1830f22a67 qmp-output-visitor: Favor new visit_free() function
Now that we have a polymorphic visit_free(), we no longer need
qmp_output_visitor_cleanup(); however, we still need to
expose the subtype for qmp_output_get_qobject().

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-10-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-07-06 10:52:04 +02:00
Eric Blake 09204eac9b opts-visitor: Favor new visit_free() function
Now that we have a polymorphic visit_free(), we no longer need
opts_visitor_cleanup(); which in turn means we no longer need
to return a subtype from opts_visitor_new() nor a public upcast
function.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-6-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-07-06 10:52:04 +02:00
Eric Blake 1158bb2a05 qapi: Add parameter to visit_end_*
Rather than making the dealloc visitor track of stack of pointers
remembered during visit_start_* in order to free them during
visit_end_*, it's a lot easier to just make all callers pass the
same pointer to visit_end_*.  The generated code has access to the
same pointer, while all other users are doing virtual walks and
can pass NULL.  The dealloc visitor is then greatly simplified.

All three visit_end_*() functions intentionally take a void**,
even though the visit_start_*() functions differ between void**,
GenericList**, and GenericAlternate**.  This is done for several
reasons: when doing a virtual walk, passing NULL doesn't care
what the type is, but when doing a generated walk, we already
have to cast the caller's specific FOO* to call visit_start,
while using void** lets us use visit_end without a cast. Also,
an upcoming patch will add a clone visitor that wants to use
the same implementation for all three visit_end callbacks,
which is made easier if all three share the same signature.

For visitors with already track per-object state (the QMP visitors
via a stack, and the string visitors which do not allow nesting),
add an assertion that the caller is indeed passing the same
pointer to paired calls.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-4-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-07-06 10:52:04 +02:00
Peter Maydell f1f7a1ddf3 block/qcow2: Don't use cpu_to_*w()
Don't use the cpu_to_*w() functions, which we are trying to deprecate.
Instead either just use cpu_to_*() to do the byteswap, or use
st*_be_p() if we need to do the store somewhere other than to a
variable that's already the correct type.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1466093177-17890-1-git-send-email-peter.maydell@linaro.org
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-07-05 16:54:04 +02:00
Kevin Wolf a03ef88f77 block: Convert bdrv_co_preadv/pwritev to BdrvChild
This is the final patch for converting the common I/O path to take
a BdrvChild parameter instead of BlockDriverState.

The completion of this conversion means that all users that perform I/O
on an image need to actually hold a reference (in the form of BdrvChild,
possible as part of a BlockBackend) to that image. This also protects
against inconsistent use of BlockBackend vs. BlockDriverState functions
because direct use of a BlockDriverState isn't possible any more and
blk->root is private for block-backends.c.

In addition, we can now distinguish different users in the I/O path,
and the future op blockers work is going to add assertions based on
permissions stored in BdrvChild.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:27 +02:00
Kevin Wolf e293b7a3df block: Convert bdrv_prwv_co() to BdrvChild
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:27 +02:00
Kevin Wolf 720ff280e7 block: Convert bdrv_pwrite_zeroes() to BdrvChild
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:27 +02:00
Kevin Wolf d9ca2ea2e2 block: Convert bdrv_pwrite(v/_sync) to BdrvChild
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:27 +02:00
Kevin Wolf cf2ab8fc34 block: Convert bdrv_pread(v) to BdrvChild
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:27 +02:00
Kevin Wolf 18d51c4bac block: Convert bdrv_write() to BdrvChild
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:27 +02:00
Kevin Wolf fbcbbf4e80 block: Convert bdrv_read() to BdrvChild
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:27 +02:00
Kevin Wolf f8e2bd538d block: Use BlockBackend for I/O in bdrv_commit()
Just like block jobs, the HMP commit command should use its own
BlockBackend for doing I/O on BlockDriverStates.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:27 +02:00
Kevin Wolf 83fd6dd3e7 block: Move bdrv_commit() to block/commit.c
No code changes, just moved from one file to another.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:27 +02:00
Kevin Wolf adad6496c5 block: Convert bdrv_co_do_readv/writev to BdrvChild
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:27 +02:00
Kevin Wolf 0d1049c7d1 block: Convert bdrv_aio_writev() to BdrvChild
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:26 +02:00
Kevin Wolf ebb7af2173 block: Convert bdrv_aio_readv() to BdrvChild
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:26 +02:00
Kevin Wolf 25ec177d90 block: Convert bdrv_co_writev() to BdrvChild
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:26 +02:00
Kevin Wolf 28b04a8f65 block: Convert bdrv_co_readv() to BdrvChild
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:26 +02:00
Kevin Wolf db1e80ee2e vhdx: Some more BlockBackend use in vhdx_create()
This does some easy conversions from bdrv_* to blk_* functions in
vhdx_create(). We should avoid bypassing the BlockBackend layer whenever
possible.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:26 +02:00
Kevin Wolf e858a9705c blkreplay: Convert to byte-based I/O
The blkreplay driver only forwards the requests it gets, so converting
it to byte granularity is trivial.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:26 +02:00
Kevin Wolf eecc77473b vvfat: Use BdrvChild for s->qcow
vvfat uses a temporary qcow file to cache written data in read-write
mode. In order to do things properly, this should show up in the BDS
graph and I/O should go through BdrvChild like for every other node.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:26 +02:00
Denis V. Lunev 1c42f149dd block: fix return code for partial write for Linux AIO
Partial write most likely means that there is not space rather than
"something wrong happens". Thus it would be more natural to return
ENOSPC rather than EINVAL.

The problem actually happens with NBD server, which has reported EINVAL
rather then ENOSPC on the first error using its protocol, which makes
report to the user wrong.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Pavel Borzenkov <pborzenkov@virtuozzo.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:26 +02:00
Eric Blake 5411541270 block: Use bool as appropriate for BDS members
Using int for values that are only used as booleans is confusing.
While at it, rearrange a couple of members so that all the bools
are contiguous.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:26 +02:00
Eric Blake 8cc9c6e92b block: Fix error message style
error_setg() is not supposed to be used for multi-sentence
messages; tweak the message to append a hint instead.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:26 +02:00
Eric Blake a5b8dd2ce8 block: Move request_alignment into BlockLimit
It makes more sense to have ALL block size limit constraints
in the same struct.  Improve the documentation while at it.

Simplify a couple of conditionals, now that we have audited and
documented that request_alignment is always non-zero.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:26 +02:00
Eric Blake d9e0dfa246 block: Split bdrv_merge_limits() from bdrv_refresh_limits()
During bdrv_merge_limits(), we were computing initial limits
based on another BDS in two places.  At first glance, the two
computations are not identical (one is doing straight copying,
the other is doing merging towards or away from zero) - but
when you realize that the first round is starting with all-0
memory, all of the merging happens to work.  Factoring out the
merging makes it easier to track how two BDS limits are merged,
in case we have future reasons to merge in even more limits.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:26 +02:00
Eric Blake ad82be2f4f block: Drop raw_refresh_limits()
The raw block driver was blindly copying all limits from bs->file,
even though: 1. the main bdrv_refresh_limits() already does this
for many of the limits, and 2. blindly copying from the children
can weaken any stricter limits that were already inherited from
the backing chain during the main bdrv_refresh_limits().  Also,
a future patch is about to move .request_alignment into
BlockLimits, and that is a limit that should NOT be copied from
other layers in the BDS chain.

Thus, we can completely drop raw_refresh_limits(), and rely on
the block layer setting up the proper limits.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:25 +02:00
Eric Blake b9f7855a50 block: Switch discard length bounds to byte-based
Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_discard and
discard_alignment.  Rename them, using 'pdiscard' as an aid to
track which remaining discard interfaces need conversion, and so
that the compiler will help us catch the change in semantics
across any rebased code.  The BlockLimits type is now completely
byte-based; and in iscsi.c, sector_limits_lun2qemu() is no
longer needed.

pdiscard_alignment is made unsigned (we use power-of-2 alignments
as bitmasks, where unsigned is easier to think about) while
leaving max_pdiscard signed (since we still have an 'int'
interface); this is comparable to what commit cf081fc did for
write zeroes limits.  We may later want to make everything an
unsigned 64-bit limit - but that requires a bigger code audit.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:25 +02:00
Eric Blake 5def6b80e1 block: Switch transfer length bounds to byte-based
Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_transfer_length
and opt_transfer_length.  Rename them (dropping the _length suffix)
so that the compiler will help us catch the change in semantics
across any rebased code, and improve the documentation.  Use unsigned
values, so that we don't have to worry about negative values and
so that bit-twiddling is easier; however, we are still constrained
by 2^31 of signed int in most APIs.

When a value comes from an external source (iscsi and raw-posix),
sanitize the results to ensure that opt_transfer is a power of 2.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:25 +02:00
Eric Blake 79ba8c986a block: Set default request_alignment during bdrv_refresh_limits()
We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Now that all drivers have been updated to supply an override
of request_alignment during their .bdrv_refresh_limits(), as
needed, the block layer itself can defer setting the default
alignment until part of the overall bdrv_refresh_limits().

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:25 +02:00
Eric Blake a65064816d block: Set request_alignment during .bdrv_refresh_limits()
We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Add a .bdrv_refresh_limits() to all four of our legacy devices
that will always be sector-only (bochs, cloop, dmg, vvfat), in
spite of their recent conversion to expose a byte interface.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:25 +02:00
Eric Blake 2914a1de99 raw-win32: Set request_alignment during .bdrv_refresh_limits()
We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

In this case, raw_probe_alignment() already did what we needed,
so just fix its signature and wire it in correctly.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:25 +02:00
Eric Blake a84178ccff qcow2: Set request_alignment during .bdrv_refresh_limits()
We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:25 +02:00
Eric Blake c8b3b998e2 iscsi: Set request_alignment during .bdrv_refresh_limits()
We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:25 +02:00
Eric Blake 835db3ee7b blkdebug: Set request_alignment during .bdrv_refresh_limits()
We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Note that when the user does not provide "align", then we were
defaulting to bs->request_alignment - but at this stage in the
initialization, that was always 512.  We were also rejecting an
explicit "align":0 from the user; this patch now allows that,
as an explicit request for the default alignment (which may not
always be 512 in the future).

qemu-iotests 77 is particularly sensitive to the fact that we
can specify an artificial alignment override in blkdebug, and
that override must continue to work even when limits are
refreshed on an already open device.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:25 +02:00
Eric Blake 24ce9a2026 block: Give nonzero result to blk_get_max_transfer_length()
Making all callers special-case 0 as unlimited is awkward,
and we DO have a hard maximum of BDRV_REQUEST_MAX_SECTORS given
our current block layer API limits.

In the case of scsi, this means that we now always advertise a
limit to the guest, even in cases where the underlying layers
previously use 0 for no inherent limit beyond the block layer.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:25 +02:00
Eric Blake f9e95af0a6 iscsi: Advertise realistic limits to block layer
The function sector_limits_lun2qemu() returns a value in units of
the block layer's 512-byte sector, and can be as large as
0x40000000, which is much larger than the block layer's inherent
limit of BDRV_REQUEST_MAX_SECTORS.  The block layer already
handles '0' as a synonym to the inherent limit, and it is nicer
to return this value than it is to calculate an arbitrary
maximum, for two reasons: we want to ensure that the block layer
continues to special-case '0' as 'no limit beyond the inherent
limits'; and we want to be able to someday expand the block
layer to allow 64-bit limits, where auditing for uses of
BDRV_REQUEST_MAX_SECTORS will help us make sure we aren't
artificially constraining iscsi to old block layer limits.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:25 +02:00
Eric Blake 202204717a nbd: Advertise realistic limits to block layer
We were basing the advertisement of maximum discard and transfer
length off of UINT32_MAX, but since the rest of the block layer
has signed int limits on a transaction, nothing could ever reach
that maximum, and we risk overflowing an int once things are
converted to byte-based rather than sector-based limits.  What's
more, we DO have a much smaller limit: both the current kernel
and qemu-nbd have a hard limit of 32M on a read or write
transaction, and while they may also permit up to a full 32 bits
on a discard transaction, the upstream NBD protocol is proposing
wording that without any explicit advertisement otherwise,
clients should limit ALL requests to the same limits as read and
write, even though the other requests do not actually require as
many bytes across the wire.  So the better limit to tell the
block layer is 32M for both values.

Behavior doesn't actually change with this patch (the block layer
is currently ignoring the max_transfer advertisements); but when
that problem is fixed in a later series, this patch will prevent
the exposure of a latent bug.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:24 +02:00
Eric Blake 476b923c32 nbd: Allow larger requests
The NBD layer was breaking up request at a limit of 2040 sectors
(just under 1M) to cater to old qemu-nbd. But the server limit
was raised to 32M in commit 2d8214885 to match the kernel, more
than three years ago; and the upstream NBD Protocol is proposing
documentation that without any explicit communication to state
otherwise, a client should be able to safely assume that a 32M
transaction will work.  It is time to rely on the larger sizing,
and any downstream distro that cares about maximum
interoperability to older qemu-nbd servers can just tweak the
value of #define NBD_MAX_SECTORS.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:24 +02:00
Eric Blake 82524274ea block: Fix harmless off-by-one in bdrv_aligned_preadv()
If the amount of data to read ends exactly on the total size
of the bs, then we were wasting time creating a local qiov
to read the data in preparation for what would normally be
appending zeroes beyond the end, even though this corner case
has nothing further to do.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:24 +02:00
Eric Blake a604fa2ba5 block: Document supported flags during bdrv_aligned_preadv()
We don't pass any flags on to drivers to handle.  Tighten an
assert to explain why we pass 0 to bdrv_driver_preadv(), and add
some comments on things to be aware of if we want to turn on
per-BDS BDRV_REQ_FUA support during reads in the future.  Also,
document that we may want to consider using unmap during
copy-on-read operations where the read is all zeroes.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:24 +02:00
Eric Blake cff86b38ac block: Tighter assertions on bdrv_aligned_pwritev()
For symmetry with bdrv_aligned_preadv(), assert that the caller
really has aligned things properly. This requires adding an align
parameter, which is used now only in the new asserts, but will
come in handy in a later patch that adds auto-fragmentation to the
max transfer size, since that value need not always be a multiple
of the alignment, and therefore must be rounded down.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:24 +02:00
Peter Maydell 1ec20c2a3a * serial port fixes (Paolo)
* Q35 modeling improvements (Paolo, Vasily)
 * chardev cleanup improvements (Marc-André)
 * iscsi bugfix (Peter L.)
 * cpu_exec patch from multi-arch patches (Peter C.)
 * pci-assign tweak (Lin Ma)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJXc+GeAAoJEL/70l94x66DtIAH/3+eUBqSxVJ3SMUxJep2Op07
 lIWqw1GHAdw1gWQDG4HzokKWrVVp/+NFYQjRFcNMfF8L+/Xm6hHAYc7Y4DMkDxSw
 zHX2BT93gPcaFJRz3Md8n2anzFHaWePx7LucPjaoas2OzrbVKXC8JT6n3GGnKQzZ
 0CxDoyW4keI4ZVAOy9SOKsLPxdSvG8uLvaZU98l/YS/TuiGzpv8IWcdHR+k1hua+
 FIenzj7jD9+JFoLEUWkU0pYs33J6yYKPiZn7HgGL9RNWKPFR88+CtMdYXgfOPo7z
 i05L9RTmL4SpahmStPN2r72MC0T0ub0czk/+qxBNms4r/2gBwaSyldmcTfAXM9o=
 =DA8v
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* serial port fixes (Paolo)
* Q35 modeling improvements (Paolo, Vasily)
* chardev cleanup improvements (Marc-André)
* iscsi bugfix (Peter L.)
* cpu_exec patch from multi-arch patches (Peter C.)
* pci-assign tweak (Lin Ma)

# gpg: Signature made Wed 29 Jun 2016 15:56:30 BST
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (35 commits)
  socket: unlink unix socket on remove
  socket: add listen feature
  char: clean up remaining chardevs when leaving
  vhost-user: disable chardev handlers on close
  vhost-user-test: fix g_cond_wait_until compat implementation
  vl: smp_parse: fix regression
  ich9: implement SCI_IRQ_SEL register
  ich9: implement ACPI_EN register
  serial: reinstate watch after migration
  serial: remove watch on reset
  char: change qemu_chr_fe_add_watch to return unsigned
  serial: separate serial_xmit and serial_watch_cb
  serial: simplify tsr_retry reset
  serial: make tsr_retry unsigned
  iscsi: fix assertion in is_sector_request_lun_aligned
  target-*: Don't redefine cpu_exec()
  pci-assign: Move "Invalid ROM" error message to pci-assign-load-rom.c
  vnc: generalize "VNC server running on ..." message
  scsi: esp: fix migration
  MC146818 RTC: add GPIO access to output IRQ
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-29 19:14:48 +01:00
Peter Lieven 0ead93120e iscsi: fix assertion in is_sector_request_lun_aligned
Commit 94d047a added an assertion the the request alignment check.
This introduced 2 issues:
 a) A off-by-one error since a request of BDRV_REQUEST_MAX_SECTORS
    is actually allowed.
 b) The bdrv_get_block_status call in the read path to check the allocation
    status requests up to INT_MAX sectors which triggers the assertion.

Fixes: 94d047a35b
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1466414680-18383-1-git-send-email-pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-29 14:03:47 +02:00
Changlong Xie 15d6729850 mirror: fix misleading comments
s/target bs/to_replace/, also we check to_replace bs is not
blocked in qmp_drive_mirror() not here

Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1466672241-22485-3-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-06-28 23:08:25 -04:00
Changlong Xie b48100cf07 blockjob: assert(cb) when create job
Callback for block job should always exist

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1466672241-22485-2-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-06-28 23:08:13 -04:00
John Snow e4808881cb mirror: limit niov to IOV_MAX elements, again
During the refactor of mirror_iteration in e5b43573,
we regressed the fix introduced in cae98cb8.

This patch re-adds IOV_MAX checking to cases where we
aren't checking alignment (and size) already.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1466625064-11280-3-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-06-28 22:53:03 -04:00
John Snow 176129552f mirror: clarify mirror_do_read return code
mirror_do_read intends to return the number of sectors processed after
the starting sector, without regard to how many sectors were processed
before the starting sector due to alignment.

Clean up the comments and code to hopefully illustrate this more clearly.

This also fixes an issue in initialization where if the mirror buffer size
is initialized to smaller than the number of sectors being requested for
transfer, we report back an incorrectly large number to the caller.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1466625064-11280-2-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-06-28 22:53:03 -04:00
Jeff Cody 7eac868a50 block/gluster: add support for selecting debug logging level
This adds commandline support for the logging level of the
gluster protocol driver, output to stdout.  The option is 'debug',
e.g.:

-drive filename=gluster://192.168.15.180/gv2/test.qcow2,debug=9

Debug levels are 0-9, with 9 being the most verbose, and 0 representing
no debugging output.  The default is the same as it was before, which
is a level of 4.  The current logging levels defined in the gluster
source are:

    0 - None
    1 - Emergency
    2 - Alert
    3 - Critical
    4 - Error
    5 - Warning
    6 - Notice
    7 - Info
    8 - Debug
    9 - Trace

(From: glusterfs/logging.h)

Reviewed-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-06-28 22:52:45 -04:00
Denis V. Lunev ff04198bf5 mirror: fix trace_mirror_yield_in_flight usage in mirror_iteration()
trace_mirror_yield_in_flight accepts 2nd arguments in sectors while here
we pass chunks instead.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1466518157-27140-1-git-send-email-den@openvz.org
CC: Jeff Cody <jcody@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-06-28 22:52:45 -04:00
Peter Lieven d99b26c42a block/nfs: add support for libnfs pagecache
upcoming libnfs will have support for a read cache that can
significantly help to speed up requests since libnfs by design
circumvents the kernel cache.

Example:
 qemu -cdrom nfs://127.0.0.1/iso/my.iso?pagecache=1024

The pagecache parameters takes the maximum amount of pages to
cache.  A page in libnfs is always the NFS_BLKSIZE which is
4KB.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1463662083-20814-3-git-send-email-pl@kamp.de
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-06-28 22:52:45 -04:00
Peter Lieven 38f8d5e025 block/nfs: refuse readahead if cache.direct is on
if we open a NFS export with disabled cache we should refuse
the readahead feature as it will cache data inside libnfs.

If a export was opened with readahead enabled it should
futher not be allowed to disable the cache while running.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1463662083-20814-2-git-send-email-pl@kamp.de
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-06-28 22:52:45 -04:00
Niels de Vos 947eb2030e block/gluster: add support for SEEK_DATA/SEEK_HOLE
GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
it possible to detect sparse areas in files.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2016-06-28 22:52:45 -04:00
Peter Maydell b0ad00b8c9 -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQEcBAABAgAGBQJXaFInAAoJEJykq7OBq3PI6VsH/0Sfgbdo1RksYuQwb/y92sCW
 EN+lxUZ+OLfgrc8PYgNZwfSM3rsfYhznL0MAXOeEe7Ahabi07w7DhGR8WvwfAOlI
 G96FRuvrIPfv5u6U6fwS4CvG3TIHVLxfHKCsTpPUmH8U5CNx/x/tpjNiWN1dj6t+
 sXybSjYHfZfiZy2tI9MFIFWCdxnF/pl0QAPhbRqc8Y/RQTDrPKRjLpz+nitN/u96
 5TS7KlELyQuP91YMmLceYSmIkHbxW703h+iE2n4hov0uZCP8Jil+2Jsd3ziQSRlL
 j6LqexQ2ViBGdDSfiZGYES2VPlsHOCwb4G+IgWBStfZg1ppaXENvcDzPrgrB+L4=
 =eUnF
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging

# gpg: Signature made Mon 20 Jun 2016 21:29:27 BST
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/tracing-pull-request: (42 commits)
  trace: split out trace events for linux-user/ directory
  trace: split out trace events for qom/ directory
  trace: split out trace events for target-ppc/ directory
  trace: split out trace events for target-s390x/ directory
  trace: split out trace events for target-sparc/ directory
  trace: split out trace events for net/ directory
  trace: split out trace events for audio/ directory
  trace: split out trace events for ui/ directory
  trace: split out trace events for hw/alpha/ directory
  trace: split out trace events for hw/arm/ directory
  trace: split out trace events for hw/acpi/ directory
  trace: split out trace events for hw/vfio/ directory
  trace: split out trace events for hw/s390x/ directory
  trace: split out trace events for hw/pci/ directory
  trace: split out trace events for hw/ppc/ directory
  trace: split out trace events for hw/9pfs/ directory
  trace: split out trace events for hw/i386/ directory
  trace: split out trace events for hw/isa/ directory
  trace: split out trace events for hw/sd/ directory
  trace: split out trace events for hw/sparc/ directory
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-20 22:30:34 +01:00
Daniel P. Berrange b54ca48e40 trace: split out trace events for block/ directory
Move all trace-events for files in the block/ directory to
their own file.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1466066426-16657-7-git-send-email-berrange@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-20 17:22:14 +01:00
Peter Maydell 7fa124b273 Error reporting patches for 2016-06-20
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIbBAABAgAGBQJXaAQPAAoJEDhwtADrkYZTbPMP+OmXlf1IwcUONhaHqD0SXAdC
 Q5NNHheNHbIdzrsOmgj2GAxCeGG08REPrdYLb8N9lI3Fev8ggQ0v3d9qiStL6Nvz
 2B7DeTxXba9Qmc2gY86RR4f7T6VK+8k1Kj9f3YcAoXHucdjDLRmtUozwES3Kk8mn
 oeBqd07yuTAO93xQhF6BDv3mfagPZV+kp15Se04ufWPpT9eG3Cp9Pl4/Yad/wJUr
 2B4pIvYnhISVlZIOvpuzdydBaCb/U0YefjWBDkawLwRBtM/kpNKoDdqr2UA0yChQ
 Xulgpt+UQLZ4aOYGLkzqBT7v3kDeEnYUFDdE03xZBLbHddOU05NEhLgzRVUEAdUS
 EAKWvWu692USDh9a2r0AYLqZW6J0+Fyi4fZ34gSuP69oi74CIvjJHDAEwsSdt6bm
 WHCchvfIk9kinzGiFA3hkAbVWAbI0EpB0J7k3mgCVVVelBpF9b9U4M2ydx9ZPufX
 t4ULXxQO9Qyxr7r0uiznugQAjIRwLMDOPPd1F5Vi+LX0wF3Pcrtc/F+3ehoLBjBD
 6zZA5lZVlUEISdzzytfjeX3ch5vRFwlujrTd2anxRuduuBV1/ztvE1v0Nsiii6Sw
 15BaHQJvBnMnAIO5LfhnAaVpIBk0jpQpwVyRcNUpeZ1VU2qlqVS4zxYZ0FDQgyer
 IZa1qYJ0Sz+oMkq9RXU=
 =TptL
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2016-06-20' into staging

Error reporting patches for 2016-06-20

# gpg: Signature made Mon 20 Jun 2016 15:56:15 BST
# gpg:                using RSA key 0x3870B400EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-error-2016-06-20:
  log: Fix qemu_set_log_filename() error handling
  log: Fix qemu_set_dfilter_ranges() error reporting
  log: Plug memory leak on multiple -dfilter
  coccinelle: Remove unnecessary variables for function return value
  error: Remove unnecessary local_err variables
  error: Remove NULL checks on error_propagate() calls
  vl: Error messages need to go to stderr, fix some

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-20 16:19:18 +01:00
Eduardo Habkost 9be385980d coccinelle: Remove unnecessary variables for function return value
Use Coccinelle script to replace 'ret = E; return ret' with
'return E'. The script will do the substitution only when the
function return type and variable type are the same.

Manual fixups:

* audio/audio.c: coding style of "read (...)" and "write (...)"
* block/qcow2-cluster.c: wrap line to make it shorter
* block/qcow2-refcount.c: change indentation of wrapped line
* target-tricore/op_helper.c: fix coding style of
  "remainder|quotient"
* target-mips/dsp_helper.c: reverted changes because I don't
  want to argue about checkpatch.pl
* ui/qemu-pixman.c: fix line indentation
* block/rbd.c: restore blank line between declarations and
  statements

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1465855078-19435-4-git-send-email-ehabkost@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Unused Coccinelle rule name dropped along with a redundant comment;
whitespace touched up in block/qcow2-cluster.c; stale commit message
paragraph deleted]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-06-20 16:38:13 +02:00
Eduardo Habkost 6b62d96137 error: Remove unnecessary local_err variables
This patch simplifies code that uses a local_err variable just to
immediately use it for an error_propagate() call.

Coccinelle patch used to perform the changes added to
scripts/coccinelle/remove_local_err.cocci.

Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1465855078-19435-3-git-send-email-ehabkost@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Blank line in s390-virtio-ccw.c restored]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-06-20 16:38:13 +02:00
Eduardo Habkost 621ff94d50 error: Remove NULL checks on error_propagate() calls
error_propagate() already ignores local_err==NULL, so there's no
need to check it before calling.

Coccinelle patch used to perform the changes added to
scripts/coccinelle/error_propagate_null.cocci.

Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1465855078-19435-2-git-send-email-ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-06-20 16:38:13 +02:00
Stefan Hajnoczi 5ab4b69ce2 backup: follow AioContext change gracefully
Move s->target to the new AioContext when there is an AioContext change.

The backup_run() coroutine does not use asynchronous I/O so there is no
need to wait for in-flight requests in a BlockJobDriver->pause()
callback.

Guest writes are intercepted by the backup job.  Treat them as guest
activity and do it even while the job is paused.  This is necessary
since the only alternative would be to fail a job that experienced guest
writes during pause once the job is resumed.  In practice the guest
writes don't interfere with AioContext switching since bdrv_drain() is
used by bdrv_set_aio_context().

Loops already contain pause points because of block_job_sleep_ns() calls
in the yield_and_check() helper function.  It is necessary to convert a
raw qemu_coroutine_yield() to block_job_yield() so the
MIRROR_SYNC_MODE_NONE case can pause.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1466096189-6477-9-git-send-email-stefanha@redhat.com
2016-06-20 14:25:41 +01:00
Stefan Hajnoczi 565ac01f8d mirror: follow AioContext change gracefully
Add block_job_pause_point() calls to mark quiescent points and make sure
to complete in-flight requests when switching AioContexts.

This patch solves undefined behavior in the mirror block job when the
BDS AioContext is changed by dataplane.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1466096189-6477-8-git-send-email-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-20 14:25:41 +01:00
Denis V. Lunev ec050f77a5 block: process before_write_notifiers in bdrv_co_discard
This is mandatory for correct backup creation. In the other case the
content under this area would be lost.

Dirty bits are set exactly like in bdrv_aligned_pwritev, i.e. they are set
even if notifier has returned a error.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1466093381-6120-4-git-send-email-den@openvz.org
CC: Fam Zheng <famz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-20 11:44:12 +01:00
Denis V. Lunev 968d8b0627 block: fix race in bdrv_co_discard with drive-mirror
Actually we must set dirty bitmap dirty after we have written all our
zeroes for correct processing in drive mirror code. In the other case
we can face not zeroes in this area in mirror_iteration.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1466093381-6120-3-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-20 11:44:12 +01:00
Denis V. Lunev 3a36e474f2 block: fixed BdrvTrackedRequest filling in bdrv_co_discard
The request area is specified in bytes, not in sectors.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1466093381-6120-2-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-20 11:44:12 +01:00
Paolo Bonzini 02d0e09503 os-posix: include sys/mman.h
qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check
is bogus without a previous inclusion of sys/mman.h.  Include it in
sysemu/os-posix.h and remove it from everywhere else.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:03 +02:00
Max Reitz 67882b1535 block/null: Implement bdrv_refresh_filename()
The null block driver ignores any filename used for creating its BDSs,
which allows creating such BDSs even without any filename at all. In
that case, we currently construct a JSON filename when queried instead
of a plain "null-co://" or "null-aio://". This patch implements
bdrv_refresh_filename() to remedy this behavior.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160610185750.30956-4-mreitz@redhat.com
[mreitz@redhat.com: Added commit message]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16 15:20:37 +02:00
Max Reitz 274fccee2b block/mirror: Fix target backing BDS
Currently, we are trying to move the backing BDS from the source to the
target in bdrv_replace_in_backing_chain() which is called from
mirror_exit(). However, mirror_complete() already tries to open the
target's backing chain with a call to bdrv_open_backing_file().

First, we should only set the target's backing BDS once. Second, the
mirroring block job has a better idea of what to set it to than the
generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
conditions on when to move the backing BDS from source to target are not
really correct).

Therefore, remove that code from bdrv_replace_in_backing_chain() and
leave it to mirror_complete().

Depending on what kind of mirroring is performed, we furthermore want to
use different strategies to open the target's backing chain:

- If blockdev-mirror is used, we can assume the user made sure that the
  target already has the correct backing chain. In particular, we should
  not try to open a backing file if the target does not have any yet.

- If drive-mirror with mode=absolute-paths is used, we can and should
  reuse the already existing chain of nodes that the source BDS is in.
  In case of sync=full, no backing BDS is required; with sync=top, we
  just link the source's backing BDS to the target, and with sync=none,
  we use the source BDS as the target's backing BDS.
  We should not try to open these backing files anew because this would
  lead to two BDSs existing per physical file in the backing chain, and
  we would like to avoid such concurrent access.

- If drive-mirror with mode=existing is used, we have to use the
  information provided in the physical image file which means opening
  the target's backing chain completely anew, just as it has been done
  already.
  If the target's backing chain shares images with the source, this may
  lead to multiple BDSs per physical image file. But since we cannot
  reliably ascertain this case, there is nothing we can do about it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160610185750.30956-3-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16 15:20:37 +02:00
Vikhyat Umrao 87cd3d20e1 rbd:change error_setg() to error_setg_errno()
Ceph RBD block driver does not use error_setg_errno() where
it is possible to use. This patch replaces error_setg()
from error_setg_errno().

Signed-off-by: Vikhyat Umrao <vumrao@redhat.com>
Message-id: 1462780319-5796-1-git-send-email-vumrao@redhat.com
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16 15:20:37 +02:00
Alberto Garcia 834fe28ddf block: Create the commit block job before reopening any image
If the base or overlay images need to be reopened in read-write mode
but the block_job_create() call fails then no one will put those
images back in read-only mode.

We can solve this problem easily by calling block_job_create() first.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: aa495045770a6f1a7cc5d408397a17c75097fdd8.1464346103.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16 15:20:37 +02:00
Alberto Garcia eb1364ceac block: use the block job list in bdrv_drain_all()
bdrv_drain_all() pauses all block jobs by using bdrv_next() to iterate
over all top-level BlockDriverStates. Therefore the code is unable to
find block jobs in other nodes.

This patch uses block_job_next() to iterate over all block jobs.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 55ee7d7d4a65c28aa1a1b28823897ef326f328e2.1464346103.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16 15:20:37 +02:00
Kevin Wolf c9d20029f4 block: Remove bs->zero_beyond_eof
It is always true for open images now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:56 +02:00
Kevin Wolf 734a77584a qcow2: Let vmstate call qcow2_co_preadv/pwrite directly
We don't really want to go through the block layer in order to read from
or write to the vmstate in a qcow2 image. Doing so required a few ugly
hacks like saving and restoring the old image size (because writing to
vmstate offsets would increase the image size) or disabling the "reads
after EOF = zeroes" logic. When calling the right functions directly,
these hacks aren't necessary any more.

Note that .bdrv_vmstate_load/save() return 0 instead of the number of
bytes in case of success now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:56 +02:00
Kevin Wolf 1a8ae82217 block: Make bdrv_load/save_vmstate coroutine_fns
This allows drivers to share code between normal I/O and vmstate
accesses.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:56 +02:00
Kevin Wolf b433d9424d block: Allow .bdrv_load/save_vmstate() to return 0/-errno
The return value of .bdrv_load/save_vmstate() can be any non-negative
number in case of success now. It used to be bytes/-errno.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf 5ddda0b8f0 block: Make .bdrv_load_vmstate() vectored
This brings it in line with .bdrv_save_vmstate().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf f1e8474115 block: Introduce bdrv_preadv()
We already have a byte-based bdrv_pwritev(), but the read counterpart
was still missing. This commit adds it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf ccb9dc1012 linux-aio: Cancel BH if not needed
linux-aio uses a BH in order to make sure that the remaining completions
are processed even in nested event loops of completion callbacks in
order to avoid deadlocks.

There is no need, however, to have the BH overhead for the first call
into qemu_laio_completion_bh() or after all pending completions have
already been processed. Therefore, this patch calls directly into
qemu_laio_completion_bh() in qemu_laio_completion_cb() and cancels
the BH after qemu_laio_completion_bh() has processed all pending
completions.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf 23b0d9fb1d block: Don't enforce 512 byte minimum alignment
If block drivers say that they can do an alignment < 512 bytes, let's
just suppose they mean it. raw-posix used to be an offender with respect
to this, but it can actually deal with byte-aligned requests now.

The default is still 512 bytes for any drivers that only implement
sector-based interfaces, but it is 1 now for drivers that implement
.bdrv_co_preadv.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf 9d52aa3c38 raw-posix: Implement .bdrv_co_preadv/pwritev
The raw-posix block driver actually supports byte-aligned requests now
on non-O_DIRECT images, like it already (and previously incorrectly)
claimed in bs->request_alignment.

For some block drivers this means that a RMW cycle can be avoided when
they write sub-sector metadata e.g. for cluster allocation.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf 2174f12bde raw-posix: Switch to bdrv_co_* interfaces
In order to use the modern byte-based .bdrv_co_preadv/pwritev()
interface, this patch switches raw-posix to coroutine-based interfaces
as a first step. In terms of semantics and performance, it doesn't make
a difference with the existing code whether we go from a coroutine to a
callback-based interface already in block/io.c or only in linux-aio.c

As there have been concerns in the past that this change may be a step
in the wrong direction with respect to a possible AIO fast path, the
old callback-based interface for linux-aio is left around and can be
reactivated when a fast path (e.g. directly from virtio-blk dataplane,
bypassing the whole block layer) is implemented.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf 9896c8765f block: Prepare bdrv_aligned_pwritev() for byte-aligned requests
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf 49c0752600 block: Prepare bdrv_aligned_preadv() for byte-aligned requests
This patch makes bdrv_aligned_preadv() ready to accept byte-aligned
requests. Note that this doesn't mean that such requests are actually
made. The caller still ensures that all requests are aligned to at least
512 bytes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf 244483e64e block: Byte-based bdrv_co_do_copy_on_readv()
In a first step to convert the common I/O path to work on bytes rather
than sectors, this converts the copy-on-read logic that is used by
bdrv_aligned_preadv().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:55 +02:00
Daniel P. Berrange 8c0dcbc4ad block: drop support for using qcow[2] encryption with system emulators
Back in the 2.3.0 release we declared qcow[2] encryption as
deprecated, warning people that it would be removed in a future
release.

  commit a1f688f415
  Author: Markus Armbruster <armbru@redhat.com>
  Date:   Fri Mar 13 21:09:40 2015 +0100

    block: Deprecate QCOW/QCOW2 encryption

The code still exists today, but by a (happy?) accident we entirely
broke the ability to use qcow[2] encryption in the system emulators
in the 2.4.0 release due to

  commit 8336aafae1
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Tue May 12 17:09:18 2015 +0100

    qcow2/qcow: protect against uninitialized encryption key

This commit was designed to prevent future coding bugs which
might cause QEMU to read/write data on an encrypted block
device in plain text mode before a decryption key is set.

It turns out this preventative measure was a little too good,
because we already had a long standing bug where QEMU read
encrypted data in plain text mode during system emulator
startup, in order to guess disk geometry:

  Thread 10 (Thread 0x7fffd3fff700 (LWP 30373)):
  #0  0x00007fffe90b1a28 in raise () at /lib64/libc.so.6
  #1  0x00007fffe90b362a in abort () at /lib64/libc.so.6
  #2  0x00007fffe90aa227 in __assert_fail_base () at /lib64/libc.so.6
  #3  0x00007fffe90aa2d2 in  () at /lib64/libc.so.6
  #4  0x000055555587ae19 in qcow2_co_readv (bs=0x5555562accb0, sector_num=0, remaining_sectors=1, qiov=0x7fffffffd260) at block/qcow2.c:1229
  #5  0x000055555589b60d in bdrv_aligned_preadv (bs=bs@entry=0x5555562accb0, req=req@entry=0x7fffd3ffea50, offset=offset@entry=0, bytes=bytes@entry=512, align=align@entry=512, qiov=qiov@entry=0x7fffffffd260, flags=0) at block/io.c:908
  #6  0x000055555589b8bc in bdrv_co_do_preadv (bs=0x5555562accb0, offset=0, bytes=512, qiov=0x7fffffffd260, flags=<optimized out>) at block/io.c:999
  #7  0x000055555589c375 in bdrv_rw_co_entry (opaque=0x7fffffffd210) at block/io.c:544
  #8  0x000055555586933b in coroutine_thread (opaque=0x555557876310) at coroutine-gthread.c:134
  #9  0x00007ffff64e1835 in g_thread_proxy (data=0x5555562b5590) at gthread.c:778
  #10 0x00007ffff6bb760a in start_thread () at /lib64/libpthread.so.0
  #11 0x00007fffe917f59d in clone () at /lib64/libc.so.6

  Thread 1 (Thread 0x7ffff7ecab40 (LWP 30343)):
  #0  0x00007fffe91797a9 in syscall () at /lib64/libc.so.6
  #1  0x00007ffff64ff87f in g_cond_wait (cond=cond@entry=0x555555e085f0 <coroutine_cond>, mutex=mutex@entry=0x555555e08600 <coroutine_lock>) at gthread-posix.c:1397
  #2  0x00005555558692c3 in qemu_coroutine_switch (co=<optimized out>) at coroutine-gthread.c:117
  #3  0x00005555558692c3 in qemu_coroutine_switch (from_=0x5555562b5e30, to_=to_@entry=0x555557876310, action=action@entry=COROUTINE_ENTER) at coroutine-gthread.c:175
  #4  0x0000555555868a90 in qemu_coroutine_enter (co=0x555557876310, opaque=0x0) at qemu-coroutine.c:116
  #5  0x0000555555859b84 in thread_pool_completion_bh (opaque=0x7fffd40010e0) at thread-pool.c:187
  #6  0x0000555555859514 in aio_bh_poll (ctx=ctx@entry=0x5555562953b0) at async.c:85
  #7  0x0000555555864d10 in aio_dispatch (ctx=ctx@entry=0x5555562953b0) at aio-posix.c:135
  #8  0x0000555555864f75 in aio_poll (ctx=ctx@entry=0x5555562953b0, blocking=blocking@entry=true) at aio-posix.c:291
  #9  0x000055555589c40d in bdrv_prwv_co (bs=bs@entry=0x5555562accb0, offset=offset@entry=0, qiov=qiov@entry=0x7fffffffd260, is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at block/io.c:591
  #10 0x000055555589c503 in bdrv_rw_co (bs=bs@entry=0x5555562accb0, sector_num=sector_num@entry=0, buf=buf@entry=0x7fffffffd2e0 "\321,", nb_sectors=nb_sectors@entry=21845, is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at block/io.c:614
  #11 0x000055555589c562 in bdrv_read_unthrottled (nb_sectors=21845, buf=0x7fffffffd2e0 "\321,", sector_num=0, bs=0x5555562accb0) at block/io.c:622
  #12 0x000055555589c562 in bdrv_read_unthrottled (bs=0x5555562accb0, sector_num=sector_num@entry=0, buf=buf@entry=0x7fffffffd2e0 "\321,", nb_sectors=nb_sectors@entry=21845) at block/io.c:634
    nb_sectors@entry=1) at block/block-backend.c:504
  #14 0x0000555555752e9f in guess_disk_lchs (blk=blk@entry=0x5555562a5290, pcylinders=pcylinders@entry=0x7fffffffd52c, pheads=pheads@entry=0x7fffffffd530, psectors=psectors@entry=0x7fffffffd534) at hw/block/hd-geometry.c:68
  #15 0x0000555555752ff7 in hd_geometry_guess (blk=0x5555562a5290, pcyls=pcyls@entry=0x555557875d1c, pheads=pheads@entry=0x555557875d20, psecs=psecs@entry=0x555557875d24, ptrans=ptrans@entry=0x555557875d28) at hw/block/hd-geometry.c:133
  #16 0x0000555555752b87 in blkconf_geometry (conf=conf@entry=0x555557875d00, ptrans=ptrans@entry=0x555557875d28, cyls_max=cyls_max@entry=65536, heads_max=heads_max@entry=16, secs_max=secs_max@entry=255, errp=errp@entry=0x7fffffffd5e0) at hw/block/block.c:71
  #17 0x0000555555799bc4 in ide_dev_initfn (dev=0x555557875c80, kind=IDE_HD) at hw/ide/qdev.c:174
  #18 0x0000555555768394 in device_realize (dev=0x555557875c80, errp=0x7fffffffd640) at hw/core/qdev.c:247
  #19 0x0000555555769a81 in device_set_realized (obj=0x555557875c80, value=<optimized out>, errp=0x7fffffffd730) at hw/core/qdev.c:1058
  #20 0x00005555558240ce in property_set_bool (obj=0x555557875c80, v=<optimized out>, opaque=0x555557875de0, name=<optimized out>, errp=0x7fffffffd730)
        at qom/object.c:1514
  #21 0x0000555555826c87 in object_property_set_qobject (obj=obj@entry=0x555557875c80, value=value@entry=0x55555784bcb0, name=name@entry=0x55555591cb3d "realized", errp=errp@entry=0x7fffffffd730) at qom/qom-qobject.c:24
  #22 0x0000555555825760 in object_property_set_bool (obj=obj@entry=0x555557875c80, value=value@entry=true, name=name@entry=0x55555591cb3d "realized", errp=errp@entry=0x7fffffffd730) at qom/object.c:905
  #23 0x000055555576897b in qdev_init_nofail (dev=dev@entry=0x555557875c80) at hw/core/qdev.c:380
  #24 0x0000555555799ead in ide_create_drive (bus=bus@entry=0x555557629630, unit=unit@entry=0, drive=0x5555562b77e0) at hw/ide/qdev.c:122
  #25 0x000055555579a746 in pci_ide_create_devs (dev=dev@entry=0x555557628db0, hd_table=hd_table@entry=0x7fffffffd830) at hw/ide/pci.c:440
  #26 0x000055555579b165 in pci_piix3_ide_init (bus=<optimized out>, hd_table=0x7fffffffd830, devfn=<optimized out>) at hw/ide/piix.c:218
  #27 0x000055555568ca55 in pc_init1 (machine=0x5555562960a0, pci_enabled=1, kvmclock_enabled=<optimized out>) at /home/berrange/src/virt/qemu/hw/i386/pc_piix.c:256
  #28 0x0000555555603ab2 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4249

So the safety net is correctly preventing QEMU reading cipher
text as if it were plain text, during startup and aborting QEMU
to avoid bad usage of this data.

For added fun this bug only happens if the encrypted qcow2
file happens to have data written to the first cluster,
otherwise the cluster won't be allocated and so qcow2 would
not try the decryption routines at all, just return all 0's.

That no one even noticed, let alone reported, this bug that
has shipped in 2.4.0, 2.5.0 and 2.6.0 shows that the number
of actual users of encrypted qcow2 is approximately zero.

So rather than fix the crash, and backport it to stable
releases, just go ahead with what we have warned users about
and disable any use of qcow2 encryption in the system
emulators. qemu-img/qemu-io/qemu-nbd are still able to access
qcow2 encrypted images for the sake of data conversion.

In the future, qcow2 will gain support for the alternative
luks format, but when this happens it'll be using the
'-object secret' infrastructure for getting keys, which
avoids this problematic scenario entirely.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-16 15:19:55 +02:00
Eric Blake fa16653874 block: Assert that flags are in range
Add a new BDRV_REQ_MASK constant, and use it to make sure that
caller flags are always valid.

Tested with 'make check' and with qemu-iotests on both '-raw'
and '-qcow2'; the only failure turned up was fixed in the
previous commit.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-16 15:19:55 +02:00
Eric Blake 73698c30ca block: Avoid bogus flags during mirroring
Commit e253f4b8 converted mirroring from sector-based bdrv_aio_*
to byte-based blk_aio_*, but failed to account for the subtle
difference in signatures (the former takes a semi-redundant length,
the latter takes a flags parameter).  Since all of our flags are
currently smaller in size than BDRV_SECTOR_SIZE, it has no ill
effects until we either perform sub-sector mirroring, or we start
asserting that no unexpected flags are set.  I found it while
testing new asserts when qemu-iotests 132 started warning about an
unknown flag 0x200000.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf d46a0bb24d qcow2: Implement .bdrv_co_pwritev()
This changes qcow2 to implement the byte-based .bdrv_co_pwritev
interface rather than the sector-based old one.

As preallocation uses the same allocation function as normal writes, and
the interface of that function needs to be changed, it is converted in
the same patch.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf 8556739355 qcow2: Use bytes instead of sectors for QCowL2Meta
In preparation for implementing .bdrv_co_pwritev in qcow2.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf aaa4d20b49 qcow2: Make copy_sectors() byte based
This will allow copy on write operations where the overwritten part of
the cluster is not aligned to sector boundaries.

Also rename the function because it has nothing to do with sectors any
more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf ecfe186380 qcow2: Implement .bdrv_co_preadv()
Reading from qcow2 images is now byte granularity.

Most of the affected code in qcow2 actually gets simpler with this
change. The only exception is encryption, which is fixed on 512 bytes
blocks; in order to keep this working, bs->request_alignment is set for
encrypted images.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf b2f65d6b02 qcow2: Work with bytes in qcow2_get_cluster_offset()
This patch changes the units that qcow2_get_cluster_offset() uses
internally, without touching the interface just yet. This will be done
in another patch.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf 515c2f431e block: Don't emulate natively supported pwritev flags
Drivers that implement .bdrv_co_pwritev() get the flags passed as an
argument to said function, but we also unconditionally emulate the flags
anyway. We shouldn't do that.

Fix this by clearing all flags that the driver supports natively after
it returns from .bdrv_co_pwritev().

Fixes: 4df863f3 ('block: Make supported_write_flags a per-bds property')
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-06-08 10:21:09 +02:00
Kevin Wolf 2a9170bcd4 block: Fix bdrv_all_delete_snapshot() error handling
The code to exit the loop after bdrv_snapshot_delete_by_id_or_name()
returned failure was duplicated. The first copy of it was too early so
that the AioContext lock would not be freed. This patch removes it so
that only the second, correct copy remains.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-08 10:21:09 +02:00
Denis V. Lunev f3c3b87dae qcow2: avoid extra flushes in qcow2
The problem with excessive flushing was found by a couple of performance
tests:
  - parallel directory tree creation (from 2 processes)
  - 32 cached writes + fsync at the end in a loop

For the first one results improved from 2.6 loops/sec to 3.5 loops/sec.
Each loop creates 10^3 directories with 10 files in each.

For the second one results improved from ~600 fsync/sec to ~1100
fsync/sec. Though, it was run on SSD so it probably won't show such
performance gain on rotational media.

qcow2_cache_flush() calls bdrv_flush() unconditionally after writing
cache entries of a particular cache. This can lead to as many as
2 additional fdatasyncs inside bdrv_flush.

We can simply skip all fdatasync calls inside qcow2_co_flush_to_os
as bdrv_flush for sure will do the job. These flushes are necessary to
keep the right order of writes to the different caches. Though this is
not necessary in the current code base as this ordering is ensured through
the flush in qcow2_cache_flush_dependency().

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Pavel Borzenkov <pborzenkov@virtuozzo.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:09 +02:00
Fam Zheng 6f6071745b raw-posix: Fetch max sectors for host block device
This is sometimes a useful value we should count in.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:09 +02:00
Eric Blake c1499a5e73 block: Kill bdrv_co_write_zeroes()
Now that all drivers have been converted to a byte interface,
we no longer need a sector interface.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:08 +02:00
Eric Blake a620f2ae15 vmdk: Convert to bdrv_co_pwrite_zeroes()
Another step on our continuing quest to switch to byte-based
interfaces.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:08 +02:00
Eric Blake 39ad937e16 raw_bsd: Convert to bdrv_co_pwrite_zeroes()
Another step on our continuing quest to switch to byte-based
interfaces.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:08 +02:00
Eric Blake 2ffa76c2bf raw-posix: Convert to bdrv_co_pwrite_zeroes()
Another step on our continuing quest to switch to byte-based
interfaces.

Signed-off-by: Eric Blake <eblake@redhat.com>
[ kwolf: Fixed up trace_paio_submit_co() call for qiov == NULL ]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:08 +02:00
Eric Blake 49a2e48348 qed: Convert to bdrv_co_pwrite_zeroes()
Another step on our continuing quest to switch to byte-based
interfaces.

Kill an abuse of the comma operator while at it (fortunately,
the semantics were still right).  Also, the test for requests
not aligned to clusters should be applied always, not just
when a backing file is present.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:08 +02:00
Eric Blake e88a36ebad gluster: Convert to bdrv_co_pwrite_zeroes()
Another step on our continuing quest to switch to byte-based
interfaces.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:08 +02:00
Eric Blake 9c21a4220b blkreplay: Convert to bdrv_co_pwrite_zeroes()
Another step on our continuing quest to switch to byte-based
interfaces.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:08 +02:00
Eric Blake 5544b59f8e qcow2: Convert to bdrv_co_pwrite_zeroes()
Another step on our continuing quest to switch to byte-based
interfaces.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:08 +02:00
Eric Blake 94d047a35b iscsi: Convert to bdrv_co_pwrite_zeroes()
Another step on our continuing quest to switch to byte-based
interfaces.

As this is the first byte-based iscsi interface, convert
is_request_lun_aligned() into two versions, one for sectors
and one for bytes.  Also, change from outright -EINVAL failure
on an unaligned request, to instead failing with -ENOTSUP to
trigger a read-modify-write fallback, particularly since the
block layer should be honoring bs->request_alignment to avoid
-EINVAL on read/write requests.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:08 +02:00
Eric Blake 74021bc497 block: Switch bdrv_write_zeroes() to byte interface
Rename to bdrv_pwrite_zeroes() to let the compiler ensure we
cater to the updated semantics.  Do the same for bdrv_co_write_zeroes().

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:08 +02:00
Eric Blake d05aa8bb4a block: Add .bdrv_co_pwrite_zeroes()
Update bdrv_co_do_write_zeroes() to be byte-based, and select
between the new byte-based bdrv_co_pwrite_zeroes() or the old
bdrv_co_write_zeroes().  The next patches will convert drivers,
then remove the old interface.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:08 +02:00
Eric Blake cf081fca4e block: Track write zero limits in bytes
Another step towards removing sector-based interfaces: convert
the maximum write and minimum alignment values from sectors to
bytes.  Rename the variables to let the compiler check that all
users are converted to the new semantics.

The maximum remains an int as long as BDRV_REQUEST_MAX_SECTORS
is constrained by INT_MAX (this means that we can't even
support a 2G write_zeroes, but just under it) - changing
operation lengths to unsigned or to 64-bits is a much bigger
audit, and debatable if we even want to do it (since at the
core, a 32-bit platform will still have ssize_t as its
underlying limit on write()).

Meanwhile, alignment is changed to 'uint32_t', since it makes no
sense to have an alignment larger than the maximum write, and
less painful to use an unsigned type with well-defined behavior
in bit operations than to have to worry about what happens if
a driver mistakenly supplies a negative alignment.

Add an assert that no one was trying to use sectors to get a
write zeroes larger than 2G, and therefore that a later conversion
to bytes won't be impacted by keeping the limit at 32 bits.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:08 +02:00
Eric Blake 8b18474451 iscsi: Use block size as minimum zero/discard alignment
If hardware does not advertise a minimum zero/discard
alignment, we still want to guarantee that the block layer
will align requests to our blocks, rather than the arbitrary
512-byte BDRV sector size.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:08 +02:00
Eric Blake ebb718a5c7 qcow2: Catch more unaligned write_zero into zero cluster
is_zero_cluster() and is_zero_cluster_top_locked() are used only
by qcow2_co_write_zeroes().  The former is too broad (we don't
care if the sectors we are about to overwrite are non-zero, only
that all other sectors in the cluster are zero), so it needs to
be called up to twice but with smaller limits - rename it along
with adding the neeeded parameter.  The latter can be inlined for
more compact code.

The testsuite change shows that we now have a sparser top file
when an unaligned write_zeroes overwrites the only portion of
the backing file with data.

Based on a patch proposal by Denis V. Lunev.

CC: Denis V. Lunev <den@openvz.org>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:08 +02:00
Denis V. Lunev 5a64e94251 qcow2: add tracepoints for qcow2_co_write_zeroes
This patch follows guidelines of all other tracepoints in qcow2, like ones
in qcow2_co_writev. I think that they should dump values in the same
quantities or be changed all together.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Eric Blake <eblake@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Message-Id: <1463476543-3087-4-git-send-email-den@openvz.org>
[eblake: typo fix in commit message]
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:08 +02:00
Denis V. Lunev ba142846b0 qcow2: simplify logic in qcow2_co_write_zeroes
Unaligned requests will occupy only one cluster. This is true since the
previous commit. Simplify the code taking this consideration into
account.

In other words, the caller is now buggy if it ever passes us an unaligned
request that crosses cluster boundaries (the only requests that can cross
boundaries will be aligned).

There are no other changes so far.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Message-Id: <1463476543-3087-3-git-send-email-den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:08 +02:00
Denis V. Lunev 443668ca40 block: split write_zeroes always
We should split requests even if they are less than write_zeroes_alignment.
For example we can have the following request:
  offset 62k
  size   4k
  write_zeroes_alignment 64k
The original code sent 1 request covering 2 qcow2 clusters, and resulted
in both clusters being allocated. But by splitting the request, we can
cater to the case where one of the two clusters can be zeroed as a
whole, for only 1 cluster allocated after the operation.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Eric Blake <eblake@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Message-Id: <1463476543-3087-2-git-send-email-den@openvz.org>

[eblake: Avoid exceeding nb_sectors, hoist alignment checks out of
loop, and update testsuite to show that patch works]

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:08 +02:00
Peter Maydell 6ed5546fa7 trivial patches for 2016-06-07
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXVuZUAAoJEL7lnXSkw9fb1VQH/ioMIWojXXTQS4sDmelGMMlK
 P3RnXmTtCqA5QYy5BXyngmjNY29ne2o5UiRm8uESLHbsf0dpQyUudLoVQhXIG10n
 TtT4veK8boO+96heCCBbA3X53SIo/Ta+gHmQlCD24wqgtZphbdXnjRTSEA8IT3N6
 1mIyEfG/avMUgq/qTF2JoeKBXscY6CVuQdH/a8QzQ0Cnmf6nBJ0NfZ7XiuEQAJ2S
 eVUkeh3SJKrJeMrk9yw8rnxKBIP27IwbEvrVcryMOdBltiW91rI/79CveANxY5S0
 6TeZrIFyb11pAPz741BpzUWjwEzs4PB6ZeJ8uy28ldJPbQfLIe18ELIV7vhpzVQ=
 =foxH
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2016-06-07' into staging

trivial patches for 2016-06-07

# gpg: Signature made Tue 07 Jun 2016 16:20:52 BST
# gpg:                using RSA key 0xBEE59D74A4C3D7DB
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg:                 aka "Michael Tokarev <mjt@corpit.ru>"
# gpg:                 aka "Michael Tokarev <mjt@debian.org>"

* remotes/mjt/tags/pull-trivial-patches-2016-06-07: (51 commits)
  hbitmap: Use DIV_ROUND_UP
  qemu-timer: Use DIV_ROUND_UP
  linux-user: Use DIV_ROUND_UP
  slirp: Use DIV_ROUND_UP
  usb: Use DIV_ROUND_UP
  rocker: Use DIV_ROUND_UP
  SPICE: Use DIV_ROUND_UP
  audio: Use DIV_ROUND_UP
  xen: Use DIV_ROUND_UP
  crypto: Use DIV_ROUND_UP
  block: Use DIV_ROUND_UP
  qed: Use DIV_ROUND_UP
  qcow/qcow2: Use DIV_ROUND_UP
  parallels: Use DIV_ROUND_UP
  coccinelle: use macro DIV_ROUND_UP instead of (((n) + (d) - 1) /(d))
  thunk: Rename args and fields in host-target bitmask conversion code
  thunk: Drop unused NO_THUNK_TYPE_SIZE guards
  qemu-common.h: Drop WORDS_ALIGNED define
  host-utils: Prefer 'false' for bool type
  docs/multi-thread-compression: Fix wrong command string
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-07 16:34:45 +01:00
Laurent Vivier 13385ae168 block: Use DIV_ROUND_UP
Replace (((n) + (d) - 1) /(d)) by DIV_ROUND_UP(n,d).

This patch is the result of coccinelle script
scripts/coccinelle/round.cocci

CC: qemu-block@nongnu.org
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-06-07 18:19:24 +03:00
Laurent Vivier c41a73ffaf qed: Use DIV_ROUND_UP
Replace (((n) + (d) - 1) /(d)) by DIV_ROUND_UP(n,d).

This patch is the result of coccinelle script
scripts/coccinelle/round.cocci

CC: qemu-block@nongnu.org
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-06-07 18:19:24 +03:00
Laurent Vivier d737b78cc1 qcow/qcow2: Use DIV_ROUND_UP
Replace (((n) + (d) - 1) /(d)) by DIV_ROUND_UP(n,d).

This patch is the result of coccinelle script
scripts/coccinelle/round.cocci

CC: qemu-block@nongnu.org
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-06-07 18:19:24 +03:00
Laurent Vivier 969401fe76 parallels: Use DIV_ROUND_UP
Replace (((n) + (d) - 1) /(d)) by DIV_ROUND_UP(n,d).

This patch is the result of coccinelle script
scripts/coccinelle/round.cocci

CC: qemu-block@nongnu.org
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-06-07 18:19:24 +03:00
Peter Maydell 030c98aff1 all: Remove unnecessary glib.h includes
Remove glib.h includes, as it is provided by osdep.h.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-06-07 18:19:24 +03:00
Fam Zheng c8a9fd8071 block: Drop bdrv_ioctl_bh_cb
Similar to the "!drv || !drv->bdrv_aio_ioctl" case above, here it is
okay to set co.ret and return. As pointed out by Paolo, a BH will be
created as necessary by the caller (bdrv_co_maybe_schedule_bh).
Besides, as pointed out by Kevin, "data" was leaked before.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20160601015223.19277-1-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-07 14:40:51 +01:00
Eric Blake 41574268b7 block: Move BlockRequest type to io.c
I was thrown by the fact that the public type BlockRequest had
an anonymous union, but no obvious discriminator.  Turns out
that the only client of the second branch of the union was code
internal to io.c, now that commit 91c6e4b killed public
multiwrite, so move it into io.c and improve the comments.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463699150-19445-1-git-send-email-eblake@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-06-07 14:40:51 +01:00
Peter Lieven 117bc3fa22 block/io: optimize bdrv_co_pwritev for small requests
in a read-modify-write cycle a small request might cause
head and tail to fall into the same aligned block. Currently
QEMU reads the same block twice in this case which is
not necessary.

Signed-off-by: Peter Lieven <pl@kamp.de>
Message-id: 1464607873-28206-1-git-send-email-pl@kamp.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-07 14:40:51 +01:00
Kevin Wolf a7944dfad0 block/io: Remove unused bdrv_aio_write_zeroes()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1464599852-15392-1-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-07 14:40:51 +01:00
Peter Lieven a6b3167fa0 block/iscsi: avoid potential overflow of acb->task->cdb
at least in the path via virtio-blk the maximum size is not
restricted.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1464080368-29584-1-git-send-email-pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Kevin Wolf 4653456a5f commit: Use BlockBackend for I/O
This changes the commit block job to use the job's BlockBackend for
performing its I/O. job->bs isn't used by the commit code any more
afterwards.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf 5c438bc68c backup: Use BlockBackend for I/O
This changes the backup block job to use the job's BlockBackend for
performing its I/O. job->bs isn't used by the backup code any more
afterwards.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf 8543c27414 backup: Remove bs parameter from backup_do_cow()
Now that we pass the job to the function, bs is implied by that.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2016-05-25 19:04:21 +02:00
John Snow 12b3e52e48 backup: Pack Notifier within BackupBlockJob
Instead of relying on peeking at bs->job, we want to explicitly get
a reference to the job that was involved in this notifier callback.

Pack the Notifier inside of the BackupBlockJob so we can use
container_of to get a reference back to the BackupBlockJob object.

This cuts out one more case where we rely unnecessarily on bs->job.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf 91ab688379 backup: Don't leak BackupBlockJob in error path
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf e253f4b897 mirror: Use BlockBackend for I/O
This changes the mirror block job to use the job's BlockBackend for
performing its I/O. job->bs isn't used by the mirroring code any more
afterwards.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf b880481579 mirror: Allow target that already has a BlockBackend
We had to forbid mirroring to a target BDS that already had a BB
attached because the node swapping at job completion would add a second
BB and we didn't support multiple BBs on a single BDS at the time. Now
we do, so we can lift the restriction.

As we allow additional BlockBackends for the target, we must expect
other users to be sending requests. There may no requests be in flight
during the graph modification, so we have to drain those users now.

The core part of this patch is a revert of commit 40365552.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf 03e35d820d stream: Use BlockBackend for I/O
This changes the streaming block job to use the job's BlockBackend for
performing the COR reads. job->bs isn't used by the streaming code any
more afterwards.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf 1e98fefd95 block: Make blk_co_preadv/pwritev() public
Also add trace points now that the function can be directly called.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf b6d2e59995 block: Convert block job core to BlockBackend
This adds a new BlockBackend field to the BlockJob struct, which
coexists with the BlockDriverState while converting the individual jobs.

When creating a block job, a new BlockBackend is created on top of the
given BlockDriverState, and it is destroyed when the BlockJob ends. The
reference to the BDS is now held by the BlockBackend instead of calling
bdrv_ref/unref manually.

We have to be careful when we use bdrv_replace_in_backing_chain() in
block jobs because this changes the BDS that job->blk points to. At the
moment block jobs are too tightly coupled with their BDS, so that moving
a job to another BDS isn't easily possible; therefore, we need to just
manually undo this change afterwards.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf 0c3169dffa block: Default to enabled write cache in blk_new()
The existing users of the function are:

1. blk_new_open(), which already enabled the write cache
2. Some test cases that don't care about the setting
3. blockdev_init() for empty drives, where the cache mode is overridden
   with the value from the options when a medium is inserted

Therefore, this patch doesn't change the current behaviour. It will be
convenient, however, for additional users of blk_new() (like block
jobs) if the most sensible WCE setting is the default.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2016-05-25 19:04:21 +02:00
Eric Blake d004bd52aa block: Rename blk_write_zeroes()
Commit 983a1600 changed the semantics of blk_write_zeroes() to
be byte-based rather than sector-based, but did not change the
name, which is an open invitation for other code to misuse the
function.  Renaming to pwrite_zeroes() makes it more in line
with other byte-based interfaces, and will help make it easier
to track which remaining write_zeroes interfaces still need
conversion.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf 36fe13317b block: Fix reconfiguring graph with drained nodes
When changing the BlockDriverState that a BdrvChild points to while the
node is currently drained, we must call the .drained_end() parent
callback. Conversely, when this means attaching a new node that is
already drained, we need to call .drained_begin().

bdrv_root_attach_child() takes now an opaque parameter, which is needed
because the callbacks must also be called if we're attaching a new child
to the BlockBackend when the root node is already drained, and they need
a way to identify the BlockBackend. Previously, child->opaque was set
too late and the callbacks would still see it as NULL.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-25 19:04:10 +02:00
Kevin Wolf 6820643fdb block: Make bdrv_drain() use bdrv_drained_begin/end()
Until now, bdrv_drained_begin() used bdrv_drain() internally to drain
the queue. This is kind of backwards and caused quiescing code to be
duplicated because bdrv_drained_begin() had to ensure that no new
requests come in even after bdrv_drain() returns, whereas bdrv_drain()
had to have them because it could be called from other places.

Instead move the bdrv_drain() code to bdrv_drained_begin() and make
bdrv_drain() a simple wrapper around bdrv_drained_begin/end().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-25 19:04:10 +02:00
Max Reitz 109525ad6a block: Drop errp parameter from blk_new()
blk_new() cannot fail so its Error ** parameter has become superfluous.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25 19:04:10 +02:00
Max Reitz 5b3639371c block: Make bdrv_open() return a BDS
There are no callers to bdrv_open() or bdrv_open_inherit() left that
pass a pointer to a non-NULL BDS pointer as the first argument of these
functions, so we can finally drop that parameter and just make them
return the new BDS.

Generally, the following pattern is applied:

    bs = NULL;
    ret = bdrv_open(&bs, ..., &local_err);
    if (ret < 0) {
        error_propagate(errp, local_err);
        ...
    }

by

    bs = bdrv_open(..., errp);
    if (!bs) {
        ret = -EINVAL;
        ...
    }

Of course, there are only a few instances where the pattern is really
pure.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25 19:04:10 +02:00
Max Reitz 28eb9b12f7 block: Drop blk_new_with_bs()
Its only caller is blk_new_open(), so we can just inline it there.

The bdrv_new_root() call is dropped in the process because we can just
let bdrv_open() create the BDS.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25 19:04:10 +02:00
Kevin Wolf 88be7b4be4 block: Fix bdrv_next() memory leak
The bdrv_next() users all leaked the BdrvNextIterator after completing
the iteration. Simply changing bdrv_next() to free the iterator before
returning NULL at the end of list doesn't work because some callers exit
the loop before looking at all BDSes.

This patch moves the BdrvNextIterator from the heap to the stack of
the caller and switches to a bdrv_first()/bdrv_next() interface for
initialising the iterator.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-25 19:04:10 +02:00
Vadim Rozenfeld 644c6869d3 iscsi: pass SCSI status back for SG_IO
Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:46 +02:00
Peter Maydell 6bd8ab6889 Block layer patches
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJXPdcnAAoJEH8JsnLIjy/WPEoQAK5vlRYqvQrrevMJviT4ZPUX
 cGGbabOcmfTBHGAgGwRLg+vQ043Sgu14JjtNbrsoSsBwAl9eAhAVGOimiieaY3vR
 35OOUxECswArJzK8I4XRx4KhI871Yq+8kHILPoXpF8L7YU38Zqa1D5z2dcOKYrL8
 Oy5IEfd1+Qfpxg/txKIioP5BzKVpz3V9/8GRNo0iAl7c806NoYFpnM0TXsed9Fjr
 YvUn1AdGHUF0/pV6vU46Qxz4yy1Q+cuoh923z6+YvXTcwok7PbjhAQWWA0qvSTuG
 otnPKMPBhYa6g7XOPD9Mra986vs6vBEGiPS5uqXoM5FqxF4Hc9LIeHEr+3hb+m53
 NLOmGqfct0USY9r6rXsOhZQb7nZCDuhaedv33ZfgE0T0cYxIilHs5PhgFAWfthhP
 aNJYlzbJUhqhTi7CJrJcFoGbNQDxux5qtlFo43M4vz/WYYDrwu8P7O3YO+sH0jU1
 EXJnbtztQvwfsiIEbIzvBRQl3XD9QmCfYO3lRbOwdCnd3ZLy47E2bze4gV3DwzK7
 CsBr+sa49xI8LMswPxTms+A+Inndn8O0mGI32Zi4nBKapjpy5Fb4YG6z8+WPfTKp
 Il1PsSgG84wm4YxGWty/UI4DoPY+hqlIIz1CNuRRNQtZTybLgNCK8ZKYbVlRppmf
 pGPpQ8pmqkeFLmx8hecm
 =ntKz
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Thu 19 May 2016 16:09:27 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream: (31 commits)
  qemu-iotests: Fix regression in 136 on aio_read invalid
  qemu-iotests: Simplify 109 with unaligned qemu-img compare
  qemu-io: Fix recent UI updates
  block: clarify error message for qmp-eject
  qemu-iotests: Some more write_zeroes tests
  qcow2: Fix write_zeroes with partially allocated backing file cluster
  qcow2: fix condition in is_zero_cluster
  block: Propagate AioContext change to all children
  block: Remove BlockDriverState.blk
  block: Don't return throttling info in query-named-block-nodes
  block: Avoid bs->blk in bdrv_next()
  block: Add bdrv_has_blk()
  block: Remove bdrv_aio_multiwrite()
  blockjob: Don't touch BDS iostatus
  blockjob: Don't set iostatus of target
  block: User BdrvChild callback for device name
  block: Use BdrvChild callbacks for change_media/resize
  block: Don't check throttled reqs in bdrv_requests_pending()
  Revert "block: Forbid I/O throttling on nodes with multiple parents for 2.6"
  block: Remove bdrv_move_feature_fields()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-19 16:54:12 +01:00
Kevin Wolf 5efdf53227 qcow2: Fix write_zeroes with partially allocated backing file cluster
In order to correctly check whether a given cluster is read as zero, we
don't only need to check whether bdrv_get_block_status_above() sets
BDRV_BLOCK_ZERO, but also if all sectors for the whole cluster have the
same status.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
2016-05-19 16:45:31 +02:00
Denis V. Lunev f575f145f4 qcow2: fix condition in is_zero_cluster
We should check for (res & BDRV_BLOCK_ZERO) only. The situation when we
will have !(res & BDRV_BLOCK_DATA) and will not have BDRV_BLOCK_ZERO is
not possible for images with bdi.unallocated_blocks_are_zero == true.

For those images where it's false, however, it can happen and we must
not consider the data zeroed then or we would corrupt the image.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-19 16:45:31 +02:00
Max Reitz b97511c7bc block: Propagate AioContext change to all children
Instead of propagating any change of a BDS's AioContext only to its file
and backing children and letting driver-specific code do the rest, just
propagate it to all and drop the thus superfluous implementations of
bdrv_{at,de}tach_aio_context() in Quorum, blkverify and VMDK.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf 1f0c461b82 block: Remove BlockDriverState.blk
This patch removes the remaining users of bs->blk, which will allow us
to have multiple BBs on top of a single BDS. In the meantime, all checks
that are currently in place to prevent the user from creating such
setups can be switched to bdrv_has_blk() instead of accessing BDS.blk.

Future patches can allow them and e.g. enable users to mirror to a block
device that already has a BlockBackend on it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf 79c719b755 block: Don't return throttling info in query-named-block-nodes
query-named-block-nodes should not return information that is related
to the attached BlockBackend rather than the node itself, so throttling
information needs to be removed from it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf 7c8eece45b block: Avoid bs->blk in bdrv_next()
We need to introduce a separate BdrvNextIterator struct that can keep
more state than just the current BDS in order to avoid using the bs->blk
pointer.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf dde33812a8 block: Add bdrv_has_blk()
In many cases we just want to know whether a BDS has at least one BB
attached, without needing to know the exact BB that is attached. In
contrast to bs->blk, this is still a valid question when more than one
BB can be attached, so just answer it by checking the parents list.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf 91c6e4b7bb block: Remove bdrv_aio_multiwrite()
Since virtio-blk implements request merging itself these days, the only
remaining users are test cases for the function. That doesn't make the
function exactly useful any more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf 66a0fae438 blockjob: Don't touch BDS iostatus
Block jobs don't actually make use of the iostatus for their BDSes, but
they manage a separate block job iostatus. Still, they require that it
is enabled for the source BDS and they enable it automatically for the
target and set the error handling mode - which ends up never being used
by the job.

This patch removes all of the BDS iostatus handling from the block job,
which removes another few bs->blk accesses.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf 81e254dc83 blockjob: Don't set iostatus of target
When block job errors were introduced, we assigned the iostatus of the
target BDS "just in case". The field has never been accessible for the
user because the target isn't listed in query-block.

Before we can allow the user to have a second BlockBackend on the
target, we need to clean this up. If anything, we would want to set the
iostatus for the internal BB of the job (which we can always do later),
but certainly not for a separate BB which the job doesn't even use.

As a nice side effect, this gets us rid of another bs->blk use.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf 4c265bf9f4 block: User BdrvChild callback for device name
In order to get rid of bs->blk for bdrv_get_device_name() and
bdrv_get_device_or_node_name(), ask all parents for their name and
simply pick the first one.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf 5c8cab4808 block: Use BdrvChild callbacks for change_media/resize
We want to get rid of BlockDriverState.blk in order to allow multiple
BlockBackends per BDS. Converting the device callbacks in block.c (which
assume a single BlockBackend) to per-child callbacks gets us rid of the
first few instances.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf cbe1beb7a1 block: Don't check throttled reqs in bdrv_requests_pending()
Checking whether there are throttled requests requires going to the
associated BlockBackend, which we want to avoid.

All users of bdrv_requests_pending() in block/io.c already call
bdrv_parent_drained_begin() first, which restarts all throttled
requests, so no throttled requests can be left here and this is removal
of dead code.

The remaining users (assertions during graph manipulation in block.c)
don't care about requests that are still queued in the BlockBackend and
haven't been issued for a BlockDriverState yet.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf 7ca7f0f6db block: Decouple throttling from BlockDriverState
This moves the throttling related part of the BDS life cycle management
to BlockBackend. The throttling group reference is now kept even when no
medium is inserted.

With this commit, throttling isn't disabled and then re-enabled any more
during graph reconfiguration. This fixes the temporary breakage of I/O
throttling when used with live snapshots or block jobs that manipulate
the graph.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf bb9aaecaf1 block/io: Quiesce parents between drained_begin/end
So far, bdrv_parent_drained_begin/end() was called for the duration of
the actual bdrv_drain() at the beginning of a drained section, but we
really should keep parents quiesced until the end of the drained
section.

This does not actually change behaviour at this point because the only
user of the .drained_begin/end BdrvChildRole callback is I/O throttling,
which already doesn't send any new requests after flushing its queue in
.drained_begin. The patch merely removes a trap for future users.

Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf c2066af051 block: Drain throttling queue with BdrvChild callback
This removes the last part of I/O throttling from block/io.c and moves
it to the BlockBackend.

Instead of having knowledge about throttling inside io.c, we can call a
BdrvChild callback .drained_begin/end, which happens to drain the
throttled requests for BlockBackend parents.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf 22aa8b246a block: Introduce BdrvChild.opaque
BlockBackends use it to get a back pointer from BdrvChild to
BlockBackend in any BdrvChildRole callbacks.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf 97148076e8 block: Move I/O throttling configuration functions to BlockBackend
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf 441565b279 block: Move actual I/O throttling to BlockBackend
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf 27ccdd5259 block: Move throttling fields from BDS to BB
This patch changes where the throttling state is stored (used to be the
BlockDriverState, now it is the BlockBackend), but it doesn't actually
make it a BB level feature yet. For example, throttling is still
disabled when the BDS is detached from the BB.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf 49d2165d7d block: Convert throttle_group_get_name() to BlockBackend
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:29 +02:00
Kevin Wolf 31dce3ccca block: throttle-groups: Use BlockBackend pointers internally
As a first step towards moving I/O throttling to the BlockBackend level,
this patch changes all pointers in struct ThrottleGroup from referencing
a BlockDriverState to referencing a BlockBackend.

This change is valid because we made sure that throttling can only be
enabled on BDSes which have a BB attached.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:29 +02:00
Kevin Wolf f2cd875d54 block: Introduce BlockBackendPublic
Some features, like I/O throttling, are implemented outside
block-backend.c, but still want to keep information in BlockBackend,
e.g. list entries that allow keeping a list of BlockBackends.

In order to avoid exposing the whole struct layout in the public header
file, this patch introduces an embedded public struct where such
information can be added and a pair of functions to convert between
BlockBackend and BlockBackendPublic.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:29 +02:00
Kevin Wolf a5614993d7 block: Make sure throttled BDSes always have a BB
It was already true in principle that a throttled BDS always has a BB
attached, except that the order of operations while attaching or
detaching a BDS to/from a BB wasn't careful enough.

This commit breaks graph manipulations while I/O throttling is enabled.
It would have been possible to keep things working with some temporary
hacks, but quite cumbersome, so it's not worth the hassle. We'll fix
things again in a minute.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:29 +02:00
Paolo Bonzini 58369e22cf qemu-common: stop including qemu/bswap.h from qemu-common.h
Move it to the actual users.  There are still a few includes of
qemu/bswap.h in headers; removing them is left for future work.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:28 +02:00
Peter Maydell f68419eee9 Block layer patches
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJXNIcBAAoJEH8JsnLIjy/WiUQP/Rzfo8pe7TWA2InxdcDOPsx4
 2/tHHJdVkffnNX5rdBvc0mOUZNJxej0NtJu2e63BB+ydYju//xw8gruKU7TR+Nd3
 nPNSsqk80prK3RNgWu7qymBvIkHDDcDQhlp48HKq+dxrfConXtHmoXapGsqc0S47
 xu03oC6WzSIyLf7TLytcjUmEprQSaCGOwsb/XaHAWL750fFAGcdy/K5PWBpUv6DN
 T0jZ3u4UneE1jeabRmqAwjgDJXC9l6riH9fP/ZtYhgNlNj84zlMXajUHSULhGknP
 cTGjwwg9tOvhcjTdhdRmWlvG1m0T77ZX3icfZLhcTdb/Uz68NXVqs8P25IGV9McD
 DPrb3T/M8JUoqLXJxIpxUm2Levof5v0dUF1PHmN5bT7pshcqv/1J7v8Fdtf9l9mp
 zI0+FK1TZ102C0H2F7AWYZSlo2EfNUSd02QQx6MbfDokDIlIxY+EgP1/Es5XlkqC
 wc7HrJvq+uix2zXw9bn9Vg9p/nDuxlRx+ppRRarNNRonaqTrx/1qAaas4bsqc9Gz
 H6gxw7BHybm0TZFdHqAdIonpesecYw6yWUXT/mQehbfphsmQmu/d2HvF2C9uUm4X
 O0JduBlKOTm2hMcg5qL6Gko8WaQIctdCJH/1Onts92cZnm8Vr/9zcmMgwGoCd7sE
 +t6Yg0jqpTUJwhZhIuCw
 =NbjJ
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Thu 12 May 2016 14:37:05 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream: (69 commits)
  qemu-iotests: iotests: fail hard if not run via "check"
  block: enable testing of LUKS driver with block I/O tests
  block: add support for encryption secrets in block I/O tests
  block: add support for --image-opts in block I/O tests
  qemu-io: Add 'write -z -u' to test MAY_UNMAP flag
  qemu-io: Add 'write -f' to test FUA flag
  qemu-io: Allow unaligned access by default
  qemu-io: Use bool for command line flags
  qemu-io: Make 'open' subcommand more like command line
  qemu-io: Add missing option documentation
  qmp: add monitor command to add/remove a child
  quorum: implement bdrv_add_child() and bdrv_del_child()
  Add new block driver interface to add/delete a BDS's child
  qemu-img: check block status of backing file when converting.
  iotests: fix the redirection order in 083
  block: Inactivate all children
  block: Drop superfluous invalidating bs->file from drivers
  block: Invalidate all children
  nbd: Simplify client FUA handling
  block: Honor BDRV_REQ_FUA during write_zeroes
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 16:33:40 +01:00
Wen Congyang 98292c61bc quorum: implement bdrv_add_child() and bdrv_del_child()
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Message-id: 1462865799-19402-3-git-send-email-xiecl.fnst@cn.fujitsu.com
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12 15:33:23 +02:00
Fam Zheng c9e9e9c66c block: Drop superfluous invalidating bs->file from drivers
Now they are invalidated by the block layer, so it's not necessary to
do this in block drivers' implementations of .bdrv_invalidate_cache.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake 52a4650574 nbd: Simplify client FUA handling
Now that the block layer honors per-bds FUA support, we don't
have to duplicate the fallback flush at the NBD layer.  The
static function nbd_co_writev_flags() is no longer needed, and
the driver can just directly use nbd_client_co_writev().

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake 465fe887cc block: Honor BDRV_REQ_FUA during write_zeroes
The block layer has a couple of cases where it can lose
Force Unit Access semantics when writing a large block of
zeroes, such that the request returns before the zeroes
have been guaranteed to land on underlying media.

SCSI does not support FUA during WRITESAME(10/16); FUA is only
supported if it falls back to WRITE(10/16).  But where the
underlying device is new enough to not need a fallback, it
means that any upper layer request with FUA semantics was
silently ignoring BDRV_REQ_FUA.

Conversely, NBD has situations where it can support FUA but not
ZERO_WRITE; when that happens, the generic block layer fallback
to bdrv_driver_pwritev() (or the older bdrv_co_writev() in qemu
2.6) was losing the FUA flag.

The problem of losing flags unrelated to ZERO_WRITE has been
latent in bdrv_co_do_write_zeroes() since commit aa7bfbff, but
back then, it did not matter because there was no FUA flag.  It
became observable when commit 93f5e6d8 paved the way for flags
that can impact correctness, when we should have been using
bdrv_co_writev_flags() with modified flags.  Compare to commit
9eeb6dd, which got flag manipulation right in
bdrv_co_do_zero_pwritev().

Symptoms: I tested with qemu-io with default writethrough cache
(which is supposed to use FUA semantics on every write), and
targetted an NBD client connected to a server that intentionally
did not advertise NBD_FLAG_SEND_FUA.  When doing 'write 0 512',
the NBD client sent two operations (NBD_CMD_WRITE then
NBD_CMD_FLUSH) to get the fallback FUA semantics; but when doing
'write -z 0 512', the NBD client sent only NBD_CMD_WRITE.

The fix is do to a cleanup bdrv_co_flush() at the end of the
operation if any step in the middle relied on a BDS that does
not natively support FUA for that step (note that we don't
need to flush after every operation, if the operation is broken
into chunks based on bounce-buffer sizing).  Each BDS gains a
new flag .supported_zero_flags, which parallels the use of
.supported_write_flags but only when accessing a zero write
operation (the flags MUST be different, because of SCSI having
different semantics based on WRITE vs. WRITESAME; and also
because BDRV_REQ_MAY_UNMAP only makes sense on zero writes).

Also fix some documentation to describe -ENOTSUP semantics,
particularly since iscsi depends on those semantics.

Down the road, we may want to add a driver where its
.bdrv_co_pwritev() honors all three of BDRV_REQ_FUA,
BDRV_REQ_ZERO_WRITE, and BDRV_REQ_MAY_UNMAP, and advertise
this via bs->supported_write_flags for blocks opened by that
driver; such a driver should NOT supply .bdrv_co_write_zeroes
nor .supported_zero_flags.  But none of the drivers touched
in this patch want to do that (the act of writing zeroes is
different enough from normal writes to deserve a second
callback).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake 4df863f336 block: Make supported_write_flags a per-bds property
Pre-patch, .supported_write_flags lives at the driver level, which
means we are blindly declaring that all block devices using a
given driver will either equally support FUA, or that we need a
fallback at the block layer.  But there are drivers where FUA
support is a per-block decision: the NBD block driver is dependent
on the remote server advertising NBD_FLAG_SEND_FUA (and has
fallback code to duplicate the flush that the block layer would do
if NBD had not set .supported_write_flags); and the iscsi block
driver is dependent on the mode sense bits advertised by the
underlying device (and is currently silently ignoring FUA requests
if the underlying device does not support FUA).

The fix is to make supported flags as a per-BDS option, set during
.bdrv_open().  This patch moves the variable and fixes NBD and iscsi
to set it only conditionally; later patches will then further
simplify the NBD driver to quit duplicating work done at the block
layer, as well as tackle the fact that SCSI does not support FUA
semantics on WRITESAME(10/16) but only on WRITE(10/16).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Denis V. Lunev 2928abce6d qcow2: improve qcow2_co_write_zeroes()
There is a possibility that qcow2_co_write_zeroes() will be called
with the partial block. This could be synthetically triggered with
    qemu-io -c "write -z 32k 4k"
and can happen in the real life in qemu-nbd. The latter happens under
the following conditions:
    (1) qemu-nbd is started with --detect-zeroes=on and is connected to the
        kernel NBD client
    (2) third party program opens kernel NBD device with O_DIRECT
    (3) third party program performs write operation with memory buffer
        not aligned to the page
In this case qcow2_co_write_zeroes() is unable to perform the operation
and mark entire cluster as zeroed and returns ENOTSUP. Thus the caller
switches to non-optimized version and writes real zeroes to the disk.

The patch creates a shortcut. If the block is read as zeroes, f.e. if
it is unallocated, the request is extended to cover full block.
User-visible situation with this block is not changed. Before the patch
the block is filled in the image with real zeroes. After that patch the
block is marked as zeroed in metadata. Thus any subsequent changes in
backing store chain are not affected.

Kevin, thank you for a cool suggestion.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake 7b1deac84e block: Kill unused sector-based blk_* functions
Now that there are no remaining clients, we can drop the
sector-based blk_read(), blk_write(), blk_aio_readv(), and
blk_aio_writev().  Sadly, there are still remaining
sector-based interfaces, such as blk_*discard(), or
blk_write_compressed(); those will have to wait for another
day.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake 60cb2fa7eb block: Introduce byte-based aio read/write
blk_aio_readv() and blk_aio_writev() are annoying in that they
can't access sub-sector granularity, and cannot pass flags.
Also, they require the caller to pass redundant information
about the size of the I/O (qiov->size in bytes must match
nb_sectors in sectors).

Add new blk_aio_preadv() and blk_aio_pwritev() functions to fix
the flaws. The next few patches will upgrade callers, then
finally delete the old interfaces.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:08 +02:00
Eric Blake 983a160050 block: Switch blk_*write_zeroes() to byte interface
Sector-based blk_write() should die; convert the one-off
variant blk_write_zeroes() to use an offset/count interface
instead.  Likewise for blk_co_write_zeroes() and
blk_aio_write_zeroes().

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:08 +02:00
Eric Blake b7d17f9fa4 block: Switch blk_read_unthrottled() to byte interface
Sector-based blk_read() should die; convert the one-off
variant blk_read_unthrottled().

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:08 +02:00
Eric Blake 8341f00dc2 block: Allow BDRV_REQ_FUA through blk_pwrite()
We have several block drivers that understand BDRV_REQ_FUA,
and emulate it in the block layer for the rest by a full flush.
But without a way to actually request BDRV_REQ_FUA during a
pass-through blk_pwrite(), FUA-aware block drivers like NBD are
forced to repeat the emulation logic of a full flush regardless
of whether the backend they are writing to could do it more
efficiently.

This patch just wires up a flags argument; followup patches
will actually make use of it in the NBD driver and in qemu-io.

Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:08 +02:00
Janne Karhunen f249924e96 Allow users to specify the vmdk virtual hardware version.
Vmdk images have metadata to indicate the vmware virtual
hardware version image was created/tested to run with.
Allow users to specify that version via new 'hwversion'
option.

[ kwolf: Adjust qemu-iotests common.filter ]

Signed-off-by: Janne Karhunen <Janne.Karhunen@gmail.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:08 +02:00
Zhou Jie ed79f37d9b block: always compile-check debug prints
Files with conditional debug statements should ensure that the printf is
always compiled. This prevents bitrot of the format string of the debug
statement. And switch debug output to stderr.

Signed-off-by: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf e3ddef25e9 block: Remove BlockDriver.bdrv_read/write
There are no block drivers left that implement the old .bdrv_read/write
interface, so it can be removed now. This gets us rid of the
corresponding emulation functions, too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf 4575eb496d vvfat: Implement .bdrv_co_preadv/pwritev interfaces
This doesn't really convert any of the actual vvfat logic to use
vectored I/O (and it's doubtful whether that would make sense), but
instead just adapts the wrappers to the modern interface.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf 513b0f026b vpc: Implement .bdrv_co_pwritev() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf d46b7cc680 vpc: Implement .bdrv_co_preadv() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf 37b1d7d8c9 vmdk: Implement .bdrv_co_pwritev() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf f10cc24359 vmdk: Implement .bdrv_co_preadv() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf a844a2b0d4 vmdk: Add vmdk_find_offset_in_cluster()
This is a byte granularity version of vmdk_find_index_in_cluster().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf fde9d56f5b vdi: Implement .bdrv_co_pwritev() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf 0865bb6f04 vdi: Implement .bdrv_co_preadv() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf 3edf1e73d5 dmg: Implement .bdrv_co_preadv() interface
This implements .bdrv_co_preadv() for the cloop block driver. While
updating the error paths, change -1 to a valid -errno code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf 5cd230819e cloop: Implement .bdrv_co_preadv() interface
This implements .bdrv_co_preadv() for the cloop block driver. While
updating the error paths, change -1 to a valid -errno code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf 3b8fd33011 bochs: Implement .bdrv_co_preadv() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf 3fb06697ae block: Introduce .bdrv_co_preadv/pwritev BlockDriver function
Many parts of the block layer are already byte granularity. The block
driver interface, however, was still missing an interface that allows
making use of this. This patch introduces a new BlockDriver interface,
which is based on coroutines, vectored, has flags and uses a byte
granularity. This is now the preferred interface for new drivers.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf cab3a3563c block: Rename bdrv_co_do_preadv/writev to bdrv_co_preadv/writev
It used to be an internal helper function just for implementing
bdrv_co_do_readv/writev(), but now that it's a public interface, it
deserves a name without "do" in it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf 0884447382 block: Support AIO drivers in bdrv_driver_preadv/pwritev()
Instead of registering emulation functions as .bdrv_co_writev, just
directly check whether the function is there or not, and use the AIO
interface if it isn't. This makes the read/write functions more
consistent with how things are done in other places (flush, discard,
etc.)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:07 +02:00
Kevin Wolf 78a07294d5 block: Introduce bdrv_driver_pwritev()
This is a function that simply calls into the block driver for doing a
write, providing the byte granularity interface we want to eventually
have everywhere, and using whatever interface that driver supports.

This one is a bit more interesting than the version for reads: It adds
support for .bdrv_co_writev_flags() everywhere, so that drivers
implementing this function can drop .bdrv_co_writev() now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:07 +02:00
Kevin Wolf 166fe96051 block: Introduce bdrv_driver_preadv()
This is a function that simply calls into the block driver for doing a
read, providing the byte granularity interface we want to eventually
have everywhere, and using whatever interface that driver supports.

For now, this is just a wrapper for calling bs->drv->bdrv_co_readv().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini dd7f7ed104 linux-aio: make it more type safe
Replace void* with an opaque LinuxAioState type.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini 6b98bd6495 block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end
Extract the handling of io_plug "depth" from linux-aio.c and let the
main bdrv_drain loop do nothing but wait on I/O.

Like the two newly introduced functions, bdrv_io_plug and bdrv_io_unplug
now operate on all children.  The visit order is now symmetrical between
plug and unplug, making it possible for formats to implement plug/unplug.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini ce0f141259 block: introduce bdrv_no_throttling_begin/end
Extract the handling of throttling from bdrv_flush_io_queue.  These
new functions will soon become BdrvChildRole callbacks, as they can
be generalized to "beginning of drain" and "end of drain".

Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini b6e84c97ed block: extract bdrv_drain_poll/bdrv_co_yield_to_drain from bdrv_drain/bdrv_co_drain
Do not call bdrv_drain_recurse twice in bdrv_co_drain.  A small
tweak to the logic in Fam's patch, which is harmless since no
one implements bdrv_drain anyway.  But better get it right.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini a72f641407 block: move restarting of throttled reqs to block/throttle-groups.c
We want to remove throttled_reqs from block/io.c.  This is the easy
part---hide the handling of throttled_reqs during disable/enable of
throttling within throttle-groups.c.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini 733bbc8cea block: make bdrv_start_throttled_reqs return void
The return value is unused and I am not sure why it would be useful.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Kevin Wolf 90c78624f1 block: Don't disable I/O throttling on sync requests
We had to disable I/O throttling with synchronous requests because we
didn't use to run timers in nested event loops when the code was
introduced. This isn't true any more, and throttling works just fine
even when using the synchronous API.

The removed code is in fact dead code since commit a8823a3b ('block: Use
blk_co_pwritev() for blk_write()') because I/O throttling can only be
set on the top layer, but BlockBackend always uses the coroutine
interface now instead of using the sync API emulation in block.c.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <1458660792-3035-2-git-send-email-kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Eric Blake 15c2f669e3 qapi: Split visit_end_struct() into pieces
As mentioned in previous patches, we want to call visit_end_struct()
functions unconditionally, so that visitors can release resources
tied up since the matching visit_start_struct() without also having
to worry about error priority if more than one error occurs.

Even though error_propagate() can be safely used to ignore a second
error during cleanup caused by a first error, it is simpler if the
cleanup cannot set an error.  So, split out the error checking
portion (basically, input visitors checking for unvisited keys) into
a new function visit_check_struct(), which can be safely skipped if
any earlier errors are encountered, and leave the cleanup portion
(which never fails, but must be called unconditionally if
visit_start_struct() succeeded) in visit_end_struct().

Generated code in qapi-visit.c has diffs resembling:

|@@ -59,10 +59,12 @@ void visit_type_ACPIOSTInfo(Visitor *v,
|         goto out_obj;
|     }
|     visit_type_ACPIOSTInfo_members(v, obj, &err);
|-    error_propagate(errp, err);
|-    err = NULL;
|+    if (err) {
|+        goto out_obj;
|+    }
|+    visit_check_struct(v, &err);
| out_obj:
|-    visit_end_struct(v, &err);
|+    visit_end_struct(v);
| out:

and in qapi-event.c:

@@ -47,7 +47,10 @@ void qapi_event_send_acpi_device_ost(ACP
|         goto out;
|     }
|     visit_type_q_obj_ACPI_DEVICE_OST_arg_members(v, &param, &err);
|-    visit_end_struct(v, err ? NULL : &err);
|+    if (!err) {
|+        visit_check_struct(v, &err);
|+    }
|+    visit_end_struct(v);
|     if (err) {
|         goto out;

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-20-git-send-email-eblake@redhat.com>
[Conflict with a doc fixup resolved]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:55 +02:00
Kevin Wolf d208c50d9d vvfat: Fix default volume label
Commit d5941dd documented that it leaves the default volume name as it
was ("QEMU VVFAT"), but it doesn't actually implement this. You get an
empty name (eleven space characters) instead.

This fixes the implementation to apply the advertised default.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-04-29 11:14:13 +02:00
Kevin Wolf ebb72c9f06 vvfat: Fix volume name assertion
Commit d5941dd made the volume name configurable, but it didn't consider
that the rw code compares the volume name string to assert that the
first directory entry is the volume name. This made vvfat crash in rw
mode.

This fixes the assertion to compare with the configured volume name
instead of a literal string.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-04-29 11:14:08 +02:00
Fam Zheng ab27c3b5e7 mirror: Workaround for unexpected iohandler events during completion
Commit 5a7e7a0ba moved mirror_exit to a BH handler but didn't add any
protection against new requests that could sneak in just before the
BH is dispatched. For example (assuming a code base at that commit):

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch
                aio_dispatch
                  ...
                    mirror_run
                      bdrv_drain
    (a)               block_job_defer_to_main_loop
          qemu_iohandler_poll
            virtio_queue_host_notifier_read
              ...
                virtio_submit_multiwrite
    (b)           blk_aio_multiwrite

        main_loop_wait # 2
          <snip>
                aio_dispatch
                  aio_bh_poll
    (c)             mirror_exit

At (a) we know the BDS has no pending request. However, the same
main_loop_wait call is going to dispatch iohandlers (EventNotifier
events), which may lead to a new I/O from guest. So the invariant is
already broken at (c). Data loss.

Commit f3926945c8 made iohandler to use aio API.  The order of
virtio_queue_host_notifier_read and block_job_defer_to_main_loop within
a main_loop_wait becomes unpredictable, and even worse, if the host
notifier event arrives at the next main_loop_wait call, the
unpredictable order between mirror_exit and
virtio_queue_host_notifier_read is also a trouble. As shown below, this
commit made the bug easier to trigger:

    - Bug case 1:

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch (qemu_aio_context)
                ...
                  mirror_run
                    bdrv_drain
    (a)             block_job_defer_to_main_loop
              aio_ctx_dispatch (iohandler_ctx)
                virtio_queue_host_notifier_read
                  ...
                    virtio_submit_multiwrite
    (b)               blk_aio_multiwrite

        main_loop_wait # 2
          ...
                aio_dispatch
                  aio_bh_poll
    (c)             mirror_exit

    - Bug case 2:

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch (qemu_aio_context)
                ...
                  mirror_run
                    bdrv_drain
    (a)             block_job_defer_to_main_loop

        main_loop_wait # 2
          ...
            aio_ctx_dispatch (iohandler_ctx)
              virtio_queue_host_notifier_read
                ...
                  virtio_submit_multiwrite
    (b)             blk_aio_multiwrite
              aio_dispatch
                aio_bh_poll
    (c)           mirror_exit

In both cases, (b) breaks the invariant wanted by (a) and (c).

Until then, the request loss has been silent. Later, 3f09bfbc7b added
asserts at (c) to check the invariant (in
bdrv_replace_in_backing_chain), and Max reported an assertion failure
first visible there, by doing active committing while the guest is
running bonnie++.

2.5 added bdrv_drained_begin at (a) to protect the dataplane case from
similar problems, but we never realize the main loop bug until now.

As a bandage, this patch disables iohandler's external events
temporarily together with bs->ctx.

Launchpad Bug: 1570134

Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-22 16:44:09 +02:00
Fam Zheng 4150ae60eb mirror: Don't extend the last sub-chunk
The last sub-chunk is rounded up to the copy granularity in the target
image, resulting in a larger size than the source.

Add a function to clip the copied sectors to the end.

This undoes the "wrong" changes to tests/qemu-iotests/109.out in
e5b43573e2. The remaining two offset changes are okay.

[ kwolf: Use DIV_ROUND_UP to calculate nb_chunks now ]

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2016-04-20 16:52:55 +02:00
Max Reitz f27a274259 block/mirror: Refresh stale bitmap iterator cache
If the drive's dirty bitmap is dirtied while the mirror operation is
running, the cache of the iterator used by the mirror code may become
stale and not contain all dirty bits.

This only becomes an issue if we are looking for contiguously dirty
chunks on the drive. In that case, we can easily detect the discrepancy
and just refresh the iterator if one occurs.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-20 16:52:55 +02:00
Max Reitz 9c83625bdd block/mirror: Revive dead yielding code
mirror_iteration() is supposed to wait if the current chunk is subject
to a still in-flight mirroring operation. However, it mixed checking
this conflict situation with checking the dirty status of a chunk. A
simplification for the latter condition (the first chunk encountered is
always dirty) led to neglecting the former: We just skip the first chunk
and thus never test whether it conflicts with an in-flight operation.

To fix this, pull out the code which waits for in-flight operations on
the first chunk of the range to be mirrored to settle.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-20 16:52:55 +02:00
Jeff Cody d85fa9eb87 block/gluster: prevent data loss after i/o error
Upon receiving an I/O error after an fsync, by default gluster will
dump its cache.  However, QEMU will retry the fsync, which is especially
useful when encountering errors such as ENOSPC when using the werror=stop
option.  When using caching with gluster, however, the last written data
will be lost upon encountering ENOSPC.  Using the write-behind-cache
xlator option of 'resync-failed-syncs-after-fsync' should cause gluster
to retain the cached data after a failed fsync, so that ENOSPC and other
transient errors are recoverable.

Unfortunately, we have no way of knowing if the
'resync-failed-syncs-after-fsync' xlator option is supported, so for now
close the fd and set the BDS driver to NULL upon fsync error.

Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-04-19 12:24:59 -04:00
Jeff Cody 5d4343e6c2 block/gluster: code movement of qemu_gluster_close()
Move qemu_gluster_close() further up in the file, in preparation
for the next patch, to avoid a forward declaration.

Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-04-19 12:24:59 -04:00
Jeff Cody a882745356 block/gluster: return correct error value
Upon error, gluster will call the aio callback function with a
ret value of -1, with errno set to the proper error value.  If
we set the acb->ret value to the return value in the callback,
that results in every error being EPERM (i.e. 1).  Instead, set
it to the proper error result.

Reviewed-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-04-19 12:24:59 -04:00
Kevin Wolf 16aaf975ee block: Don't ignore flags in blk_{,co,aio}_write_zeroes()
Commit 57d6a428 neglected to pass the given flags to blk_aio_prwv(),
which broke discard by WRITE SAME for scsi-disk (the UNMAP bit would be
ignored).

Commit fc1453cd introduced the same bug for blk_write_zeroes(). This is
used for 'qemu-img convert' without has_zero_init (e.g. on a block
device) and for preallocation=falloc in parallels.

Commit 8896e088 is the version for blk_co_write_zeroes(). This function
is only used in qemu-io.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-04-15 17:22:12 +02:00
Jeff Cody 9c057d0b68 block/vpc: update comments to be compliant w/coding guidelines
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:12 +02:00
Jeff Cody 32f6439cf7 block/vpc: set errp in vpc_open
Add more useful error information to failure paths in vpc_open

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:12 +02:00
Jeff Cody 66176fc6a7 block/vpc: make checks on max table size a bit more lax
The check on the max_table_size field not being larger than required is
valid, and in accordance with the VHD spec.  However, there have been
VHD images encountered in the wild that have an out-of-spec max table
size that is technically too large.

There is no issue in allowing this larger table size, as we also
later verify that the computed size (used for the pagetable) is
large enough to fit all sectors.  In addition, max_table_entries
is bounds checked against SIZE_MAX and INT_MAX.

Remove the strict check, so that we can accomodate these sorts of
images that are benignly out of spec.

Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Reported-by: Grant Wu <grantwwu@gmail.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:12 +02:00
Jeff Cody c23fb11bbb block/vpc: Use the correct max sector count for VHD images
The old VHD_MAX_SECTORS value is incorrect, and is a throwback
to the CHS calculations.  The VHD specification allows images up to 2040
GiB, which (using 512 byte sectors) corresponds to a maximum number of
sectors of 0xff000000, rather than the old value of 0xfe0001ff.

Update VHD_MAX_SECTORS to reflect the correct value.

Also, update comment references to the actual size limit, and correct
one compare so that we can have sizes up to the limit.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:12 +02:00
Jeff Cody bab246db1d block/vpc: use current_size field for XenConverter VHD images
XenConverter VHD images are another VHD image where current_size is
different from the CHS values in the the format header.  Use
current_size as the default, by looking at the creator_app signature
field.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:12 +02:00
Stefan Hajnoczi 9bdfb9e8ac vpc: use current_size field for XenServer VHD images
The vpc driver has two methods of determining virtual disk size.  The
correct one to use depends on the software that generated the image
file.  Add the XenServer creator_app signature so that image size is
correctly detected for those images.

Reported-by: Grant Wu <grantwwu@gmail.com>
Reported-by: Spencer Baugh <sbaugh@catern.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:12 +02:00
Jeff Cody 0211b9becc block/vpc: set errp in vpc_create
Add more useful error information to failure paths in vpc_create().

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:11 +02:00
Kevin Wolf 7fa84cd8d4 block: Fix blk_aio_write_zeroes()
Commit 57d6a428 broke blk_aio_write_zeroes() because in some write
functions in the call path don't have an explicit length argument but
reuse qiov->size instead. Which is great, except that write_zeroes
doesn't have a qiov, which this commit interprets as 0 bytes.
Consequently, blk_aio_write_zeroes() didn't effectively do anything.

This patch introduces an explicit acb->bytes in BlkAioEmAIOCB and uses
that instead of acb->rwco.size.

The synchronous version of the function is okay because it does pass a
qiov (with the right size and a NULL pointer as its base).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-04-15 17:22:11 +02:00
Max Reitz 4e876bcf2b qcow2: Prevent backing file names longer than 1023
We reject backing file names with a length of more than 1023 characters
when opening a qcow2 file, so we should not produce such files
ourselves.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-12 18:06:51 +02:00
Paolo Bonzini 40a99aace3 vpc: fix return value check for blk_pwrite
bdrv_pwrite_sync used to return zero or negative error, while blk_pwrite returns
the number of written bytes when successful.  This caused VPC image creation
to fail spectacularly: it wrote the first 512 bytes, and then exited immediately
because of the non-zero answer from blk_pwrite.  But the truly spectacular part
is that it returns a positive value (the 512 that blk_pwrite returned) causing
everyone to believe that it succeeded.

This fixes qemu-iotests with vpc format.

Fixes: b8f45cdf78
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-12 18:06:51 +02:00
Fam Zheng 39bf92dd70 mirror: Replace bdrv_drain(bs) with bdrv_co_drain(bs)
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1459855253-5378-3-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-04-11 16:59:09 +01:00
Fam Zheng a77fd4bb29 block: Fix bdrv_drain in coroutine
Using the nested aio_poll() in coroutine is a bad idea. This patch
replaces the aio_poll loop in bdrv_drain with a BH, if called in
coroutine.

For example, the bdrv_drain() in mirror.c can hang when a guest issued
request is pending on it in qemu_co_mutex_lock().

Mirror coroutine in this case has just finished a request, and the block
job is about to complete. It calls bdrv_drain() which waits for the
other coroutine to complete. The other coroutine is a scsi-disk request.
The deadlock happens when the latter is in turn pending on the former to
yield/terminate, in qemu_co_mutex_lock(). The state flow is as below
(assuming a qcow2 image):

  mirror coroutine               scsi-disk coroutine
  -------------------------------------------------------------
  do last write

    qcow2:qemu_co_mutex_lock()
    ...
                                 scsi disk read

                                   tracked request begin

                                   qcow2:qemu_co_mutex_lock.enter

    qcow2:qemu_co_mutex_unlock()

  bdrv_drain
    while (has tracked request)
      aio_poll()

In the scsi-disk coroutine, the qemu_co_mutex_lock() will never return
because the mirror coroutine is blocked in the aio_poll(blocking=true).

With this patch, the added qemu_coroutine_yield() allows the scsi-disk
coroutine to make progress as expected:

  mirror coroutine               scsi-disk coroutine
  -------------------------------------------------------------
  do last write

    qcow2:qemu_co_mutex_lock()
    ...
                                 scsi disk read

                                   tracked request begin

                                   qcow2:qemu_co_mutex_lock.enter

    qcow2:qemu_co_mutex_unlock()

  bdrv_drain.enter
>   schedule BH
>   qemu_coroutine_yield()
>                                  qcow2:qemu_co_mutex_lock.return
>                                  ...
                                   tracked request end
    ...
    (resumed from BH callback)
  bdrv_drain.return
  ...

Reported-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1459855253-5378-2-git-send-email-famz@redhat.com
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-04-11 16:59:09 +01:00
Peter Maydell 31370dbe5d Block layer patches for 2.6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJXA9qJAAoJEH8JsnLIjy/W+W8P/2xTXH8+h0qurUvv5Rz6HUbD
 HHNlGnqa3M5yMLMqmtlb0J/9dj0pTlGNkp7d/9blh3MlZH/ZpeUMtq8Lro23YhUF
 0J94sRKGhK3T5GqYSA/BFbVvXQJ3yX7cKcYaQjmh7rK6Ua+65Mv/dulci+jbfGuu
 BkiVgumGAalSeaFqXZR685g61ZHbz+mQJnd3VFcvletnPBu0j1GMkuU0THAcy09q
 CTUjwWlL9CHu1lYkAa0KxgFtj6mZ+gEu5ws5Lvk8yFtSB+af/mJtzoHuq/7+Ske2
 7SiVXotFW8kR7ic1TnWiEku8+31FSBVJp6xUcRVTDOHVG7oSQxBDg5bGwPn8TxXy
 bvLvTJDIFodGhkiDTuGLuttvX+U2xCl4GmBS01OiFF53UGWgjjY+pkDZiaNC4nFW
 vwItj7/KGKL2Nq6cVfGCDOYYjFtHAPGI3yyJ2babXecv+9nKr0WeJpk6cfKVnP17
 rZs28Y3Ub/P2M4oOt4YdhRSanQZbe5eIQOsdfWX4q12hujL0zbsCtV6dpjeTsY74
 J4CBLzBYCj6y9Jc8R+D6XLYJBtJQGaSj99Oqe9WdUuHEqTGQt4HMuYHAj77wImVG
 ccURYiBpmB+FChLhG+yIlo1PHS0kpgeD+ZkZDHC0gYdiFqdKnFeQ7lj3Jj2tSWXY
 7Y7qbaOhtXp+20M6oO+G
 =ax/b
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches for 2.6

# gpg: Signature made Tue 05 Apr 2016 16:32:25 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream:
  crypto: Avoid memory leak on failure
  qemu-iotests: 149: Use "/usr/bin/env python"
  block: Forbid I/O throttling on nodes with multiple parents for 2.6
  block: forbid x-blockdev-del from acting on DriveInfo

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-05 17:03:32 +01:00
Eric Blake 95c3df5a24 crypto: Avoid memory leak on failure
Commit 7836857 introduced a memory leak due to invalid use of
Error vs. visit_type_end().  If visiting the intermediate
members fails, we clear the error and unconditionally use
visit_end_struct() on the same error object; but if that
cleanup succeeds, we then skip the qapi_free call.

Until a later patch adds visit_check_struct(), the only safe
approach is to use two separate error objects.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1459526222-30052-1-git-send-email-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-05 17:23:21 +02:00
Eric Blake a89ef0c357 nbd: don't request FUA on FLUSH
The NBD protocol does not clearly document what will happen
if a client sends NBD_CMD_FLAG_FUA on NBD_CMD_FLUSH.
Historically, both the qemu and upstream NBD servers silently
ignored that flag, but that feels a bit risky.  Meanwhile, the
qemu NBD client unconditionally sends the flag (without even
bothering to check whether the caller cares; at least with
NBD_CMD_WRITE the client only sends FUA if requested by a
higher layer).

There is ongoing discussion on the NBD list to fix the
protocol documentation to require that the server MUST ignore
the flag (unless the kernel folks can better explain what FUA
means for a flush), but until those doc improvements land, the
current nbd.git master was recently changed to reject the flag
with EINVAL (see nbd commit ab22e082), which now makes it
impossible for a qemu client to use FLUSH with an upstream NBD
server.

We should not send FUA with flush unless the upstream protocol
documents what it will do, and even then, it should be something
that the caller can opt into, rather than being unconditional.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1459526902-32561-1-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-05 11:46:52 +02:00
Stefan Hajnoczi 0d94b74655 block/nfs: add missing #include "qemu/cutils.h"
parse_uint_full() used to be included from qemu-common.h but was moved
to qemu/cutils.h in commit f348b6d1a5
("util: move declarations out of qemu-common.h").

Cc: Veronia Bahaa <veroniabahaa@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1459341994-20567-3-git-send-email-stefanha@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-03-30 16:50:39 -04:00
Stefan Hajnoczi d165b8cb8b block/nfs: add missing #include "qapi/error.h"
error_setg() used to be included indirectly through qemu/osdep.h.  Since
commit da34e65cb4 ("include/qemu/osdep.h:
Don't include qapi/error.h") it requires an explicit include.

Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1459341994-20567-2-git-send-email-stefanha@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-03-30 16:50:39 -04:00
Max Reitz a90639270d block/null-{co,aio}: Implement get_block_status()
Signed-off-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:16:04 +02:00
Max Reitz cd219eb1e5 block/null-{co,aio}: Allow reading zeroes
This is optional so that it does not impede the null block driver's
performance unless this behavior is desired.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:16:03 +02:00
Kevin Wolf 09cf9db1bc block: Remove bdrv_(set_)enable_write_cache()
The only remaining users were block jobs (mirror and backup) which
unconditionally enabled WCE on the BlockBackend of the target image. As
these block jobs don't go through BlockBackend for their I/O requests,
they aren't affected by this setting anyway but always get a writeback
mode, so that call can be removed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:03 +02:00
Kevin Wolf 61de4c6808 block: Remove BDRV_O_CACHE_WB
The previous patches have successively made blk->enable_write_cache the
true source for the information whether a writethrough mode must be
implemented. The corresponding BDRV_O_CACHE_WB is only useless baggage
we're carrying around, so now's the time to remove it.

At the same time, we remove the 'cache.writeback' option parsing on the
BDS level as the only effect was setting the BDRV_O_CACHE_WB flag.

This change requires test cases that explicitly enabled the option to
drop it. Other than that and the change of the error message when
writethrough is enabled on the BDS level (from "Can't set writethrough
mode" to "doesn't support the option"), there should be no change in
behaviour.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:03 +02:00
Kevin Wolf 5481531154 raw: Support BDRV_REQ_FUA
Pass through the FUA flag to the lower layer so that the separate flush
can be saved in practically relevant cases where a (raw) format driver
sits on top of the protocol driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf 2b556518c3 nbd: Support BDRV_REQ_FUA
The NBD server already used to send a FUA flag when the writethrough
mode was set. This code was a remnant from the times where protocol
drivers actually had to implement writethrough modes. Since nowadays the
block layer sends flushes in writethrough mode and non-root nodes are
always writeback, this was mostly dead code - only mostly because if NBD
was configured to be used without a format, we sent _both_ FUA and an
explicit flush afterwards, which makes the code not technically dead,
but useless overhead.

This patch changes the code so that the block layer's FUA flag is
recognised and translated into a NBD FUA flag. The additional flush is
avoided now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf 9f0eb9e129 iscsi: Support BDRV_REQ_FUA
This replaces the existing hack in the iscsi driver that sent the FUA
bit in writethrough mode and ignored the following flush in order to
optimise the number of roundtrips (see commit 73b5394e).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf 93f5e6d88a block: Introduce bdrv_co_writev_flags()
This function will allow drivers to implement BDRV_REQ_FUA natively
instead of sending a separate flush after the write.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf c83f9fba2a block/qapi: Use blk_enable_write_cache()
Now that WCE is handled on the BlockBackend level, the flag is
meaningless for BDSes. As the schema requires us to fill the field,
we return an enabled write cache for them.

Note that this means that querying the BlockBackend name may return
writethrough as the cache information, whereas querying the node-name of
the root of that same BlockBackend will return writeback.

This may appear odd at first, but it actually makes sense because it
correctly repesents the layer that implements the WCE handling. This
becomes more apparent when you consider nodes that are the root node of
multiple BlockBackends, where each BB can have its own WCE setting.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf bfd18d1e0b block: Move enable_write_cache to BB level
Whether a write cache is used or not is a decision that concerns the
user (e.g. the guest device) rather than the backend. It was already
logically part of the BB level as bdrv_move_feature_fields() always kept
it on top of the BDS tree; with this patch, the core of it (the actual
flag and the additional flushes) is also implemented there.

Direct callers of bdrv_open() must pass BDRV_O_CACHE_WB now if bs
doesn't have a BlockBackend attached.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf 855a6a93a1 block: Handle flush error in bdrv_pwrite_sync()
We don't want to silently ignore a flush error.

Also, there is little point in avoiding the flush for writethrough modes
and once WCE is moved to the BB layer, we definitely need the flush here
because bdrv_pwrite() won't involve one any more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:01 +02:00
Kevin Wolf 72e775c7d9 block: Always set writeback mode in blk_new_open()
All callers of blk_new_open() either don't rely on the WCE bit set after
blk_new_open() because they explicitly set it anyway, or they pass
BDRV_O_CACHE_WB unconditionally.

This patch changes blk_new_open() so that it always enables writeback
mode and asserts that BDRV_O_CACHE_WB is clear. For those callers that
used to pass BDRV_O_CACHE_WB unconditionally, the flag is removed now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:01 +02:00
Pavel Dovgalyuk 63785678f3 replay: introduce block devices record/replay
This patch introduces block driver that implement recording
and replaying of block devices' operations.
All block completion operations are added to the queue.
Queue is flushed at checkpoints and information about processed requests
is recorded to the log. In replay phase the queue is matched with
events read from the log. Therefore block devices requests are processed
deterministically.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
[ kwolf: Rebased onto modified and already applied part of the series ]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:15:57 +02:00
Pavel Dovgalyuk c32b82afaf block: add flush callback
This patch adds callback for flush request. This callback is responsible
for flushing whole block devices stack. bdrv_flush function does not
proceed to underlying devices. It should be performed by this callback
function, if needed.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:12:15 +02:00
Daniel P. Berrange e6ff69bf5e block: move encryption deprecation warning into qcow code
For a couple of releases we have been warning

  Encrypted images are deprecated
  Support for them will be removed in a future release.
  You can use 'qemu-img convert' to convert your image to an unencrypted one.

This warning was issued by system emulators, qemu-img, qemu-nbd
and qemu-io. Such a broad warning was issued because the original
intention was to rip out all the code for dealing with encryption
inside the QEMU block layer APIs.

The new block encryption framework used for the LUKS driver does
not rely on the unloved block layer API for encryption keys,
instead using the QOM 'secret' object type. It is thus no longer
appropriate to warn about encryption unconditionally.

When the qcow/qcow2 drivers are converted to use the new encryption
framework too, it will be practical to keep AES-CBC support present
for use in qemu-img, qemu-io & qemu-nbd to allow for interoperability
with older QEMU versions and liberation of data from existing encrypted
qcow2 files.

This change moves the warning out of the generic block code and
into the qcow/qcow2 drivers. Further, the warning is set to only
appear when running the system emulators, since qemu-img, qemu-io,
qemu-nbd are expected to support qcow2 encryption long term now that
the maint burden has been eliminated.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:12:15 +02:00
Daniel P. Berrange 78368575a6 block: add generic full disk encryption driver
Add a block driver that is capable of supporting any full disk
encryption format. This utilizes the previously added block
encryption code, and at this time supports the LUKS format.

The driver code is capable of supporting any format supported
by the QCryptoBlock module, so it registers one block driver
for each format. This patch only registers the "luks" driver
since the "qcow" driver is there only for back-compatibility
with existing qcow built-in encryption.

New LUKS compatible volumes can be formatted using qemu-img
with defaults for all settings.

$ qemu-img create --object secret,data=123456,id=sec0 \
      -f luks -o key-secret=sec0 demo.luks 10G

Alternatively the cryptographic settings can be explicitly
set

$ qemu-img create --object secret,data=123456,id=sec0 \
      -f luks -o key-secret=sec0,cipher-alg=aes-256,\
                 cipher-mode=cbc,ivgen-alg=plain64,hash-alg=sha256 \
      demo.luks 10G

And query its size

$ qemu-img info demo.img
image: demo.img
file format: luks
virtual size: 10G (10737418240 bytes)
disk size: 132K
encrypted: yes

Note that it was not necessary to provide the password
when querying info for the volume. The password is only
required when performing I/O on the volume

All volumes created by this new 'luks' driver should be
capable of being opened by the kernel dm-crypt driver.

The only algorithms listed in the LUKS spec that are
not currently supported by this impl are sha512 and
ripemd160 hashes and cast6 cipher.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
[ kwolf - Added #include to resolve conflict with da34e65c ]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:11:26 +02:00
Daniel P. Berrange abb06c5ac1 block: add flag to indicate that no I/O will be performed
When opening an image it is useful to know whether the caller
intends to perform I/O on the image or not. In the case of
encrypted images this will allow the block driver to avoid
having to prompt for decryption keys when we merely want to
query header metadata about the image. eg qemu-img info

This flag is enforced at the top level only, since even if
we don't want todo I/O on the 'qcow2' file payload, the
underlying 'file' driver will still need todo I/O to read
the qcow2 header, for example.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Max Reitz 5430215699 block/qapi: Pass bdrv_query_blk_stats() s->stats
bdrv_query_blk_stats() does not need access to all of BlockStats,
BlockDeviceStats is enough and is what this function is actually
supposed to fill.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Max Reitz 0e8f44bee9 block/qapi: Set s->device in bdrv_query_stats()
This is the only instance of bdrv_query_blk_stats() accessing anything
in the BlockStats structure other than s->stats, so let us move it to
its caller (where it makes just as much sense) allowing us to make
bdrv_query_blk_stats() take a pointer to the BlockDeviceStats instead of
BlockStats.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Peter Xu 5eda622768 block/qapi: fix unbounded stack for dump_qdict
Using heap instead of stack for better safety.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Peter Xu 853ccfed8f block/qapi: make two printf() formats literal
Fix two places to use literal printf format when possible.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Kevin Wolf 72f41b6fbd block: Remove blk_set_bs()
The function is unused since commit f21d96d0 ('block: Use BdrvChild in
BlockBackend').

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-03-30 11:59:32 +02:00
Programmingkid d0855f1235 block/raw-posix.c: Make physical devices usable in QEMU under Mac OS X host
Mac OS X can be picky when it comes to allowing the user
to use physical devices in QEMU. Most mounted volumes
appear to be off limits to QEMU. If an issue is detected,
a message is displayed showing the user how to unmount a
volume. Now QEMU uses both CD and DVD media.

Signed-off-by: John Arbuckle <programmingkidx@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Peter Maydell 553934db66 -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQIcBAABAgAGBQJW+dDJAAoJEL2+eyfA3jBXk6oP/R/zX4foUVFMTvDbxHwWc41t
 gXGk1BpIjFnteab/tzUBDIdgs/DPxzM6bClhe45gNInVBgnOyeVmpUwRGGNYQKbn
 FdkrAcC6Vy6BJv+xRTMMS+h4i6ebJ6HqqQPwkz0VulxsAknDPQsBebe0tM8uO7k9
 G+ccMYOyUUiGTIRC3pBkRCu8APEialPSv3MpUTMtp71R3US+pEwmo1AgyOFq/lDu
 B/8LUBoR48XCEGfOA6ZixzoMwF1lTWpezx5/KF+fQ26sgnNzjpwYWnJk+LG7Gtvj
 8PHYsHDoXSISlIgxzLpS0AA6s54+mutgIeNJG5FBXGrSSNlAB1+cKZsnZw42YjfI
 BVIHQkmcGT+h9UEDdekiOfQorypSYRm51ueTGO/lUbxNifvJ5LQA97F0G/filoCj
 ovGIfOwgpWaEBPCb//U1TRGhhTg+dNyCeC4GoxDEFyWmLPYp8p7Xtz+vsZOIdH4O
 Wl9i6BzzeNEgJyutKqn2qpNLl6Pfd548MOJJqAUkGxDGrCJMkmn2lJSpSSji6cdm
 y4Az/tPY0/xpxwjSRakaIMOlhDoGXmrQG+I6JG1TZLSH7x1+Ajhr2ryx4CBONceV
 1quibAqoG1GwxCyYn7dv4aeJrDlg3XzEWQW6nJhuE91d9ZH+jF5u2+i+IZcQCDBe
 Cd6d0SZlcOnq3M5LiOrA
 =1ekF
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging

# gpg: Signature made Tue 29 Mar 2016 01:48:09 BST using RSA key ID C0DE3057
# gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>"
# gpg:                 aka "Jeffrey Cody <jeff@codyprime.org>"
# gpg:                 aka "Jeffrey Cody <codyprime@gmail.com>"

* remotes/cody/tags/block-pull-request:
  qemu-iotests: add no-op streaming test
  qemu-iotests: fix test_stream_partial()
  block: never cancel a streaming job without running stream_complete()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-29 19:54:49 +01:00
Alberto Garcia 6578629e08 block: never cancel a streaming job without running stream_complete()
We need to call stream_complete() in order to do all the necessary
clean-ups, even if there's an early failure. At the moment it's only
useful to make sure that s->backing_file_str is not leaked, but it
will become more important if we introduce support for streaming to
any intermediate node.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 2abedf2debc65c250560237f31a8e6756883c8fc.1458566441.git.berto@igalia.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-03-28 13:56:44 -04:00
Veronia Bahaa f348b6d1a5 util: move declarations out of qemu-common.h
Move declarations out of qemu-common.h for functions declared in
utils/ files: e.g. include/qemu/path.h for utils/path.c.
Move inline functions out of qemu-common.h and into new files (e.g.
include/qemu/bcd.h)

Signed-off-by: Veronia Bahaa <veroniabahaa@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:17 +01:00
Rutuja Shah 73bcb24d93 Replaced get_tick_per_sec() by NANOSECONDS_PER_SECOND
This patch replaces get_ticks_per_sec() calls with the macro
NANOSECONDS_PER_SECOND. Also, as there are no callers, get_ticks_per_sec()
is then removed.  This replacement improves the readability and
understandability of code.

For example,

    timer_mod(fdctrl->result_timer,
	      qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + (get_ticks_per_sec() / 50));

NANOSECONDS_PER_SECOND makes it obvious that qemu_clock_get_ns
matches the unit of the expression on the right side of the plus.

Signed-off-by: Rutuja Shah <rutu.shah.26@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:17 +01:00
Markus Armbruster daf015ef5a include/qemu/iov.h: Don't include qemu-common.h
qemu-common.h should only be included by .c files.  Its file comment
explains why: "No header file should depend on qemu-common.h, as this
would easily lead to circular header dependencies."

qemu/iov.h includes qemu-common.h for QEMUIOVector stuff.  Move all
that to qemu/iov.h and drop the ill-advised include.  Include
qemu/iov.h where the QEMUIOVector stuff is now missing.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:16 +01:00
Markus Armbruster da34e65cb4 include/qemu/osdep.h: Don't include qapi/error.h
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef.  Since then, we've moved to include qemu/osdep.h
everywhere.  Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h.  That's in excess of
100KiB of crap most .c files don't actually need.

Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h.  Include qapi/error.h in .c files that need it and don't
get it now.  Include qapi-types.h in qom/object.h for uint16List.

Update scripts/clean-includes accordingly.  Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h.  Update the list of includes in the qemu/osdep.h
comment quoted above similarly.

This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third.  Unfortunately, the number depending on
qapi-types.h shrinks only a little.  More work is needed for that one.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:15 +01:00
Eric Blake 32bafa8fdd qapi: Don't special-case simple union wrappers
Simple unions were carrying a special case that hid their 'data'
QMP member from the resulting C struct, via the hack method
QAPISchemaObjectTypeVariant.simple_union_type().  But by using
the work we started by unboxing flat union and alternate
branches, coupled with the ability to visit the members of an
implicit type, we can now expose the simple union's implicit
type in qapi-types.h:

| struct q_obj_ImageInfoSpecificQCow2_wrapper {
|     ImageInfoSpecificQCow2 *data;
| };
|
| struct q_obj_ImageInfoSpecificVmdk_wrapper {
|     ImageInfoSpecificVmdk *data;
| };
...
| struct ImageInfoSpecific {
|     ImageInfoSpecificKind type;
|     union { /* union tag is @type */
|         void *data;
|-        ImageInfoSpecificQCow2 *qcow2;
|-        ImageInfoSpecificVmdk *vmdk;
|+        q_obj_ImageInfoSpecificQCow2_wrapper qcow2;
|+        q_obj_ImageInfoSpecificVmdk_wrapper vmdk;
|     } u;
| };

Doing this removes asymmetry between QAPI's QMP side and its
C side (both sides now expose 'data'), and means that the
treatment of a simple union as sugar for a flat union is now
equivalent in both languages (previously the two approaches used
a different layer of dereferencing, where the simple union could
be converted to a flat union with equivalent C layout but
different {} on the wire, or to an equivalent QMP wire form
but with different C representation).  Using the implicit type
also lets us get rid of the simple_union_type() hack.

Of course, now all clients of simple unions have to adjust from
using su->u.member to using su->u.member.data; while this touches
a number of files in the tree, some earlier cleanup patches
helped minimize the change to the initialization of a temporary
variable rather than every single member access.  The generated
qapi-visit.c code is also affected by the layout change:

|@@ -7393,10 +7393,10 @@ void visit_type_ImageInfoSpecific_member
|     }
|     switch (obj->type) {
|     case IMAGE_INFO_SPECIFIC_KIND_QCOW2:
|-        visit_type_ImageInfoSpecificQCow2(v, "data", &obj->u.qcow2, &err);
|+        visit_type_q_obj_ImageInfoSpecificQCow2_wrapper_members(v, &obj->u.qcow2, &err);
|         break;
|     case IMAGE_INFO_SPECIFIC_KIND_VMDK:
|-        visit_type_ImageInfoSpecificVmdk(v, "data", &obj->u.vmdk, &err);
|+        visit_type_q_obj_ImageInfoSpecificVmdk_wrapper_members(v, &obj->u.vmdk, &err);
|         break;
|     default:
|         abort();

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1458254921-17042-13-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-18 10:29:26 +01:00
Alberto Garcia 6049490df4 quorum: Emit QUORUM_REPORT_BAD for reads in fifo mode
If there's an I/O error in one of Quorum children then QEMU
should emit QUORUM_REPORT_BAD. However this is not working with
read-pattern=fifo. This patch fixes this problem.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: d57e39e8d3e8564003a1e2aadbd29c97286eb2d2.1458034554.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-17 16:43:30 +01:00
Kevin Wolf 8896e08814 block: Use blk_co_pwritev() in blk_co_write_zeroes()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 16:30:00 +01:00
Kevin Wolf 57d6a42883 block: Use blk_aio_prwv() for aio_read/write/write_zeroes
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 16:30:00 +01:00
Kevin Wolf a55d3fba99 block: Use blk_prw() in blk_pread()/blk_pwrite()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf fc1453cdfc block: Use blk_co_pwritev() in blk_write_zeroes()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf 5bd5119667 block: Pull up blk_read_unthrottled() implementation
Use blk_read(), so that it goes through blk_co_preadv() like all read
requests from the BB to the BDS.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf a8823a3bfd block: Use blk_co_pwritev() for blk_write()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf 1bf1cbc91f block: Use blk_co_preadv() for blk_read()
This patch introduces blk_co_preadv() as a central function on the
BlockBackend level that is supposed to handle all read requests from the
BB to its root BDS eventually.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf f21d96d04b block: Use BdrvChild in BlockBackend
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Max Reitz 981f4f578e block: Add blk_next_root_bs()
This function iterates over all BDSs attached to a BB. We are going to
need it when rewriting bdrv_next() so it no longer uses bdrv_states.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz fe1a9cbc33 block: Move some bdrv_*_all() functions to BB
Move bdrv_commit_all() and bdrv_flush_all() to the BlockBackend level.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz 7c735873d9 blockdev: Remove blk_hide_on_behalf_of_hmp_drive_del()
We can basically inline it in hmp_drive_del(); monitor_remove_blk() is
called already, so we just need to call bdrv_make_anon(), too.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz efaa7c4eeb blockdev: Split monitor reference from BB creation
Before this patch, blk_new() automatically assigned a name to the new
BlockBackend and considered it referenced by the monitor. This patch
removes the implicit monitor_add_blk() call from blk_new() (and
consequently the monitor_remove_blk() call from blk_delete(), too) and
thus blk_new() (and related functions) no longer take a BB name
argument.

In fact, there is only a single point where blk_new()/blk_new_open() is
called and the new BB is monitor-owned, and that is in blockdev_init().
Besides thus relieving us from having to invent names for all of the BBs
we use in qemu-img, this fixes a bug where qemu cannot create a new
image if there already is a monitor-owned BB named "image".

If a BB and its BDS tree are created in a single operation, as of this
patch the BDS tree will be created before the BB is given a name
(whereas it was the other way around before). This results in minor
change to the output of iotest 087, whose reference output is amended
accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz e5e785500b blockdev: Separate BB name management
Introduce separate functions (monitor_add_blk() and
monitor_remove_blk()) which set or unset a BB name. Since the name is
equivalent to the monitor's reference to a BB, adding a name the same as
declaring the BB to be monitor-owned and removing it revokes this
status, hence the function names.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz 2cf22d6a1a blockdev: Add list of all BlockBackends
While monitor_block_backends contains nearly all BBs, we sometimes
really need all BBs. To this end, this patch adds the block_backend
list.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz 9492b0b928 blockdev: Rename blk_backends
The blk_backends list does not contain all BlockBackends but only the
ones which are referenced by the monitor, and that is not necessarily
true for every BlockBackend. Rename the list to monitor_block_backends
to make that fact clear.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz a55448b368 qapi: Drop QERR_UNKNOWN_BLOCK_FORMAT_FEATURE
Just specifying a custom string is simpler in basically all places that
used it, and in addition, specifying the BB or node name is something we
generally do not do in other error messages when opening a BDS, so we
should not do it here.

This changes the output for iotest 036 (to the better, in my opinion),
so the reference output needs to be changed accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz 1393f21270 block: Add blk_commit_all()
Later, we will remove bdrv_commit_all() and move its contents here, and
in order to replace bdrv_commit_all() calls by calls to blk_commit_all()
before doing so, we need to add it as an alias now.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz 74d1b8fc27 block: Use blk_next() in block-backend.c
Instead of iterating directly through blk_backends, we can use
blk_next() instead. This gives us some abstraction from the list itself
which we can use to rename it, for example.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Stefan Hajnoczi 1f3ddfcb25 Revert "qed: Implement .bdrv_drain"
This reverts commit df9a681dc9.

Note that commit df9a681dc9 included some
unrelated hunks, possibly due to a merge failure or an overlooked
squash.  This only reverts the qed .bdrv_drain() implementation.

The qed .bdrv_drain() implementation is unsafe and can lead to a double
request completion.

Paolo Bonzini reports:
"The problem is that bdrv_qed_drain calls qed_plug_allocating_write_reqs
unconditionally, but this is not correct if an allocating write is
queued.  In this case, qed_unplug_allocating_write_reqs will restart the
allocating write and possibly cause it to complete.  The aiocb however
is still in use for the L2/L1 table writes, and will then be completed
again as soon as the table writes are stable."

For QEMU 2.6 we can simply revert this commit.  A full solution for the
qed need check timer may be added if the bdrv_drain() implementation is
extended.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1457431876-8475-1-git-send-email-stefanha@redhat.com
2016-03-17 09:50:14 +00:00
Jeff Cody 03c698f0a2 block/sheepdog: fix argument passed to qemu_strtoul()
The function qemu_strtoul() reads 'unsigned long' sized data,
which is larger than uint32_t on 64-bit machines.

Even though the snap_id field in the header is 32-bits, we must
accommodate the full size in qemu_strtoul().

This patch also adds more meaningful error handling to the
qemu_strtoul() call, and subsequent results.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Message-id: e56fc50abedd9a112e0683342c8eafda063cd2f9.1456935548.git.jcody@redhat.com
2016-03-16 13:25:29 -04:00
Alberto Garcia b9c600d207 quorum: Fix crash in quorum_aio_cb()
quorum_aio_cb() emits the QUORUM_REPORT_BAD event if there's
an I/O error in a Quorum child. However sacb->aiocb must be
correctly initialized for this to happen. read_quorum_children() and
read_fifo_child() are not doing this, which results in a QEMU crash.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 8138570d071ba7e25db3736979234a1fd71dbd05.1457610443.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-14 17:35:06 +01:00
Fam Zheng ebab225910 block: Move block dirty bitmap code to separate files
The only code change is making bdrv_dirty_bitmap_truncate public. It is
used in block.c.

Also two long lines (bdrv_get_dirty) are wrapped.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1457412306-18940-5-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-14 17:35:05 +01:00
Fam Zheng b2f56462d5 backup: Use Bitmap to replace "s->bitmap"
"s->bitmap" tracks done sectors, we only check bit states without using any
iterator which HBitmap is good for. Switch to "Bitmap" which is simpler and
more memory efficient.

Meanwhile, rename it to done_bitmap, to reflect the intention.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1457412306-18940-2-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-14 17:35:05 +01:00
Kevin Wolf b8f45cdf78 vpc: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:44 +01:00
Kevin Wolf c4bea1690e vmdk: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf 10bf03af12 vhdx: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf a08f0c3b5f vdi: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf fba98d455a sheepdog: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf 8a56fdadaf qed: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf 23588797b6 qcow2: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf 6af4016020 qcow: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf 8942764f54 parallels: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf c10c9d9615 block: Introduce blk_set_allow_write_beyond_eof()
We check that the guest can't write beyond the end of its disk, but for
other internal users it can make sense to allow growing a file.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf 6340472c54 block: Use writeback in .bdrv_create() implementations
There's no reason to use a writethrough cache mode while creating an
image.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Fam Zheng 71968dbfd8 vmdk: Switch to heap arrays for vmdk_parent_open
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Fam Zheng 5997c210b9 vmdk: Switch to heap arrays for vmdk_read_cid
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Fam Zheng 965415eb20 vmdk: Switch to heap arrays for vmdk_write_cid
It is only called once for each opened image, so we can do it the easy
way.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Changlong Xie 924e8a2bbc quorum: modify vote rules for flush operation
Keep flush interface the same logic as quorum read/write, Otherwise in
following scenario, we'll encounter unexpected errors.

Quorum has two children(A, B). A do flush sucessfully, but B flush failed.
This cause the filesystem of guest become read-only with following errors:

end_request: I/O error, dev vda, sector 11159960
Aborting journal on device vda3-8
EXT4-fs error (device vda3): ext4_journal_start_sb:327: Detected abort journal
EXT4-fs (vda3): Remounting filesystem read-only

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Changlong Xie 0ae053b7e1 qmp event: Refactor QUORUM_REPORT_BAD
Introduce QuorumOpType, and make QUORUM_REPORT_BAD compatible
with it.

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Jeff Cody fb9245c261 block/vpc: give option to force the current_size field in .bdrv_create
When QEMU creates a VHD image, it goes by the original spec,
calculating the current_size based on the nearest CHS geometry (with an
exception for disks > 127GB).

Apparently, Azure will only allow images that are sized to the nearest
MB, and the current_size as calculated from CHS cannot guarantee that.

Allow QEMU to create images similar to how Hyper-V creates images, by
setting current_size to the specified virtual disk size.  This
introduces an option, force_size, to be passed to the vpc format during
image creation, e.g.:

    qemu-img convert -f raw -o force_size -O vpc test.img test.vhd

When using the "force_size" option, the creator app field used by
QEMU will be "qem2" instead of "qemu", to indicate the difference.
In light of this, we also add parsing of the "qem2" field during
vpc_open.

Bug reference: https://bugs.launchpad.net/qemu/+bug/1490611

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:42 +01:00
Jeff Cody c540d53ac8 block/vpc: choose size calculation method based on creator_app field
The VHD file format is used by both Virtual PC, and Hyper-V.  However,
how the virtual disk size is calculated varies between the two.

Virtual PC uses the CHS drive parameters to determine the drive size.
Hyper-V, on the other hand, uses the current_size field in the footer
when determining image size.

This is problematic for a few reasons:

* VHD images from Hyper-V, using CHS calculations, will likely be
  trunctated.

* If we just rely always on current_size, then QEMU may have data
  compatibility issues with Virtual PC (we may write too much data
  into a VHD file to be used by Virtual PC, for instance).

* Existing VHD images created by QEMU have used the CHS calculations,
  except for images exceeding the 127GB limit.  We want to remain
  compatible with our own generated images.

Luckily, the VHD specification defines a 'Creator App' field, that is
used to indicate what software created the VHD file.

This patch does two things:

    1. Uses the 'Creator App' field to help determine how to calculate
       size, and

    2. Adds a VPC format option 'force_size_calc', so that the user can
       override the 'Creator App' auto-detection, in case there exist
       VHD images with unknown or contradictory 'Creator App' entries.

N.B.: We currently use the maximum CHS value as an indication to use the
current_size field.  This patch does not change that, even with the
'force_size_calc' option.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:42 +01:00
Kevin Wolf c21cc6ca98 block/qapi: Include empty drives in query-blockstats
Since commit 5ec18f8c, query-blockstats didn't return the statistics of
drives without media any more because such drives have only a BB now,
but not a BDS any more.

This patch fixes the regression so that query-blockstats iterates over
BBs by default and empty drives are displayed again.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-14 16:46:42 +01:00
Kevin Wolf b07363a1a3 block/qapi: Factor out bdrv_query_bds_stats()
The new functions handles the data that is taken from the
BlockDriverState.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-14 16:46:42 +01:00
Kevin Wolf 2b77e60ab8 block/qapi: Factor out bdrv_query_blk_stats()
The new functions handles the data that is taken from the BlockBackend.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-14 16:46:42 +01:00
Daniel P. Berrange b16a44e13e osdep: remove use of socket_error() from all code
Now that QEMU wraps the Win32 sockets methods to automatically
set errno upon failure, there is no reason for callers to use
the socket_error() method. They can rely on accessing errno
even on Win32. Remove all use of socket_error() from general
code, leaving it as a static method in oslib-win32.c only.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-10 17:19:34 +00:00
Eric Blake 0399293e5b util: Shorten references into SocketAddress
An upcoming patch will alter how simple unions, like SocketAddress,
are laid out, which will impact all lines of the form 'addr->u.XXX'
(expanding it to the longer 'addr->u.XXX.data').  For better
legibility in that patch, and less need for line wrapping, it's better
to use a temporary variable to reduce the effect of a layout change to
just the variable initializations, rather than every reference within
a SocketAddress.  Also, take advantage of some C99 initialization where
it makes sense (simplifying g_new0() to g_new()).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1457021813-10704-7-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-05 10:41:52 +01:00
John Snow 4c9bca7e39 block/backup: avoid copying less than full target clusters
During incremental backups, if the target has a cluster size that is
larger than the backup cluster size and we are backing up to a target
that cannot (for whichever reason) pull clusters up from a backing image,
we may inadvertantly create unusable incremental backup images.

For example:

If the bitmap tracks changes at a 64KB granularity and we transmit 64KB
of data at a time but the target uses a 128KB cluster size, it is
possible that only half of a target cluster will be recognized as dirty
by the backup block job. When the cluster is allocated on the target
image but only half populated with data, we lose the ability to
distinguish between zero padding and uninitialized data.

This does not happen if the target image has a backing file that points
to the last known good backup.

Even if we have a backing file, though, it's likely going to be faster
to just buffer the redundant data ourselves from the live image than
fetching it from the backing file, so let's just always round up to the
target granularity.

The same logic applies to backup modes top, none, and full. Copying
fractional clusters without the guarantee of COW is dangerous, but even
if we can rely on COW, it's likely better to just re-copy the data.

Reported-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1456433911-24718-3-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:55:14 -05:00
John Snow 16096a4d47 block/backup: make backup cluster size configurable
64K might not always be appropriate, make this a runtime value.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1456433911-24718-2-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:55:14 -05:00
Fam Zheng 21cd917ff5 mirror: Add mirror_wait_for_io
The three lines are duplicated a number of times now, refactor a
function.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1454637630-10585-3-git-send-email-famz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:31 -05:00
Fam Zheng e5b43573e2 mirror: Rewrite mirror_iteration
The "pnum < nb_sectors" condition in deciding whether to actually copy
data is unnecessarily strict, and the qiov initialization is
unnecessarily for bdrv_aio_write_zeroes and bdrv_aio_discard.

Rewrite mirror_iteration to fix both flaws.

The output of iotests 109 is updated because we now report the offset
and len slightly differently in mirroring progress.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1454637630-10585-2-git-send-email-famz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:31 -05:00
Max Reitz 04a3615860 vhdx: Simplify vhdx_set_shift_bits()
For values which are powers of two (and we do assume all of these to
be), sizeof(x) * 8 - 1 - clz(x) == ctz(x). Therefore, use ctz().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 1450451066-13335-3-git-send-email-mreitz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:31 -05:00
Max Reitz 939901dcd2 vhdx: DIV_ROUND_UP() in vhdx_calc_bat_entries()
We have DIV_ROUND_UP(), so we can use it to produce more easily readable
code. It may be slower than the bit shifting currently performed
(because it actually performs a division), but since
vhdx_calc_bat_entries() is never used in a hot path, this is completely
fine.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 1450451066-13335-2-git-send-email-mreitz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:31 -05:00
Daniel P. Berrange b189346eb1 iscsi: add support for getting CHAP password via QCryptoSecret API
The iSCSI driver currently accepts the CHAP password in plain text
as a block driver property. This change adds a new "password-secret"
property that accepts the ID of a QCryptoSecret instance.

  $QEMU \
     -object secret,id=sec0,filename=/home/berrange/example.pw \
     -drive driver=iscsi,url=iscsi://example.com/target-foo/lun1,\
            user=dan,password-secret=sec0

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1453385961-10718-4-git-send-email-berrange@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:31 -05:00
Daniel P. Berrange 1bff960642 curl: add support for HTTP authentication parameters
If connecting to a web server which has authentication
turned on, QEMU gets a 401 as curl has not been configured
with any authentication credentials.

This adds 4 new parameters to the curl block driver
options 'username', 'password-secret', 'proxy-username'
and 'proxy-password-secret'. Passwords are provided using
the recently added 'secret' object type

 $QEMU \
     -object secret,id=sec0,filename=/home/berrange/example.pw \
     -object secret,id=sec1,filename=/home/berrange/proxy.pw \
     -drive driver=http,url=http://example.com/some.img,\
            username=dan,password-secret=sec0,\
            proxy-username=dan,proxy-password-secret=sec1

Of course it is possible to use the same secret for both the
proxy & server passwords if desired, or omit the proxy auth
details, or the server auth details as required.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1453385961-10718-3-git-send-email-berrange@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:31 -05:00
Daniel P. Berrange 60390a2192 rbd: add support for getting password from QCryptoSecret object
Currently RBD passwords must be provided on the command line
via

  $QEMU -drive file=rbd:pool/image:id=myname:\
               key=QVFDVm41aE82SHpGQWhBQXEwTkN2OGp0SmNJY0UrSE9CbE1RMUE=:\
               auth_supported=cephx

This is insecure because the key is visible in the OS process
listing.

This adds support for an 'password-secret' parameter in the RBD
parameters that can be used with the QCryptoSecret object to
provide the password via a file:

  echo "QVFDVm41aE82SHpGQWhBQXEwTkN2OGp0SmNJY0UrSE9CbE1RMUE=" > poolkey.b64
  $QEMU -object secret,id=secret0,file=poolkey.b64,format=base64 \
        -drive driver=rbd,filename=rbd:pool/image:id=myname:\
               auth_supported=cephx,password-secret=secret0

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1453385961-10718-2-git-send-email-berrange@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:30 -05:00
Vasiliy Tolstov eab8eb8db3 sheepdog: allow to delete snapshot
This patch implements a blockdriver function bdrv_snapshot_delete() in
the sheepdog driver. With the new function, snapshots of sheepdog can
be deleted from libvirt.

Cc: Jeff Cody <jcody@redhat.com>
Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Vasiliy Tolstov <v.tolstov@selfip.ru>
Message-id: 1450873346-22334-1-git-send-email-mitake.hitoshi@lab.ntt.co.jp
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:30 -05:00
Peter Lieven 7725b8bf12 block/nfs: add support for setting debug level
recent libnfs versions support logging debug messages. Add
support for it in qemu through an URL parameter.

Example:
 qemu -cdrom nfs://127.0.0.1/iso/my.iso?debug=2

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1447052973-14513-1-git-send-email-pl@kamp.de
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:30 -05:00
Alberto Garcia 398befdf50 qapi: Add burst length fields to BlockDeviceInfo
This patch adds the new bps_*_max_length and iops_*_max_length
parameters to the BlockDeviceInfo struct.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-02-22 14:08:06 +01:00
Changlong Xie f38738e212 quorum: fix segfault when read fails in fifo mode
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-02-22 09:49:46 +01:00
Daniel P. Berrange 75822a12c0 nbd: enable use of TLS with NBD block driver
This modifies the NBD driver so that it is possible to request
use of TLS. This is done by providing the 'tls-creds' parameter
with the ID of a previously created QCryptoTLSCreds object.

For example

  $QEMU -object tls-creds-x509,id=tls0,endpoint=client,\
                dir=/home/berrange/security/qemutls \
        -drive driver=nbd,host=localhost,port=9000,tls-creds=tls0

The client will drop the connection if the NBD server does not
provide TLS.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-15-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-16 17:16:33 +01:00
Daniel P. Berrange f95910fe6b nbd: implement TLS support in the protocol negotiation
This extends the NBD protocol handling code so that it is capable
of negotiating TLS support during the connection setup. This involves
requesting the STARTTLS protocol option before any other NBD options.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-14-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-16 17:16:28 +01:00
Daniel P. Berrange 1c778ef729 nbd: convert to using I/O channels for actual socket I/O
Now that all callers are converted to use I/O channels for
initial connection setup, it is possible to switch the core
NBD protocol handling core over to use QIOChannel APIs for
actual sockets I/O.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-7-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-16 17:13:57 +01:00
Daniel P. Berrange 064097d919 nbd: convert block client to use I/O channels for connection setup
This converts the NBD block driver client to use the QIOChannelSocket
class for initial connection setup. The NbdClientSession struct has
two pointers, one to the master QIOChannelSocket providing the raw
data channel, and one to a QIOChannel which is the current channel
used for I/O. Initially the two point to the same object, but when
TLS support is added, they will point to different objects.

The qemu-img & qemu-io tools now need to use MODULE_INIT_QOM to
ensure the QIOChannel object classes are registered. The qemu-nbd
tool already did this.

In this initial conversion though, all I/O is still actually done
using the raw POSIX sockets APIs.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-4-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-16 17:13:22 +01:00
Peter Maydell f075c89f0a -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQEcBAABAgAGBQJWugGdAAoJEJykq7OBq3PIE5cH/RY0iIsvDU4dD05Xs/xlDPNL
 0lPwqmYIhGPCYdEQIhLsnf8bfYkyXGaS/bgWFzKk6mb4XI6HhcDdEoOXOSa3Rreq
 bm58ngE9xwC5ZiDuBFxzMXuoU/+iAU/YHiTxL84JaMpBjUQsUCvHlLMVXJXGphzG
 pXE5neYuy+MftiZ5f0nlVaIGCdd9rHWkKMgpViR0Xanx1cZe0TSkh12qlCRLXNJG
 SHDq2wNrFp7lV4IK2evcU59GvJ5tfXyaDEh7yCMrvMUFM96jBjSgtgKbuE+t4VWi
 6Awli6z9OvygSu0s/GuJ7yW41RSiTv/Qu1TagltkNjuSB5Gzw3lTgmorgVlLS+Y=
 =PZTe
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

# gpg: Signature made Tue 09 Feb 2016 15:11:25 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/block-pull-request:
  block: add missing call to bdrv_drain_recurse
  blockjob: Fix hang in block_job_finish_sync
  iov: avoid memcpy for "simple" iov_from_buf/iov_to_buf

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-02-09 17:56:46 +00:00
Paolo Bonzini 9dcf8ecd9e block: add missing call to bdrv_drain_recurse
This is also needed in bdrv_drain_all, not just in bdrv_drain.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1450867706-19860-3-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-02-09 13:52:26 +00:00
Eric Blake 51e72bc1dd qapi: Swap visit_* arguments for consistent 'name' placement
JSON uses "name":value, but many of our visitor interfaces were
called with visit_type_FOO(v, &value, name, errp).  This can be
a bit confusing to have to mentally swap the parameter order to
match JSON order.  It's particularly bad for visit_start_struct(),
where the 'name' parameter is smack in the middle of the
otherwise-related group of 'obj, kind, size' parameters! It's
time to do a global swap of the parameter ordering, so that the
'name' parameter is always immediately after the Visitor argument.

Additional reason in favor of the swap: the existing include/qjson.h
prefers listing 'name' first in json_prop_*(), and I have plans to
unify that file with the qapi visitors; listing 'name' first in
qapi will minimize churn to the (admittedly few) qjson.h clients.

Later patches will then fix docs, object.h, visitor-impl.h, and
those clients to match.

Done by first patching scripts/qapi*.py by hand to make generated
files do what I want, then by running the following Coccinelle
script to affect the rest of the code base:
 $ spatch --sp-file script `git grep -l '\bvisit_' -- '**/*.[ch]'`
I then had to apply some touchups (Coccinelle insisted on TAB
indentation in visitor.h, and botched the signature of
visit_type_enum() by rewriting 'const char *const strings[]' to
the syntactically invalid 'const char*const[] strings').  The
movement of parameters is sufficient to provoke compiler errors
if any callers were missed.

    // Part 1: Swap declaration order
    @@
    type TV, TErr, TObj, T1, T2;
    identifier OBJ, ARG1, ARG2;
    @@
     void visit_start_struct
    -(TV v, TObj OBJ, T1 ARG1, const char *name, T2 ARG2, TErr errp)
    +(TV v, const char *name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp)
     { ... }

    @@
    type bool, TV, T1;
    identifier ARG1;
    @@
     bool visit_optional
    -(TV v, T1 ARG1, const char *name)
    +(TV v, const char *name, T1 ARG1)
     { ... }

    @@
    type TV, TErr, TObj, T1;
    identifier OBJ, ARG1;
    @@
     void visit_get_next_type
    -(TV v, TObj OBJ, T1 ARG1, const char *name, TErr errp)
    +(TV v, const char *name, TObj OBJ, T1 ARG1, TErr errp)
     { ... }

    @@
    type TV, TErr, TObj, T1, T2;
    identifier OBJ, ARG1, ARG2;
    @@
     void visit_type_enum
    -(TV v, TObj OBJ, T1 ARG1, T2 ARG2, const char *name, TErr errp)
    +(TV v, const char *name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp)
     { ... }

    @@
    type TV, TErr, TObj;
    identifier OBJ;
    identifier VISIT_TYPE =~ "^visit_type_";
    @@
     void VISIT_TYPE
    -(TV v, TObj OBJ, const char *name, TErr errp)
    +(TV v, const char *name, TObj OBJ, TErr errp)
     { ... }

    // Part 2: swap caller order
    @@
    expression V, NAME, OBJ, ARG1, ARG2, ERR;
    identifier VISIT_TYPE =~ "^visit_type_";
    @@
    (
    -visit_start_struct(V, OBJ, ARG1, NAME, ARG2, ERR)
    +visit_start_struct(V, NAME, OBJ, ARG1, ARG2, ERR)
    |
    -visit_optional(V, ARG1, NAME)
    +visit_optional(V, NAME, ARG1)
    |
    -visit_get_next_type(V, OBJ, ARG1, NAME, ERR)
    +visit_get_next_type(V, NAME, OBJ, ARG1, ERR)
    |
    -visit_type_enum(V, OBJ, ARG1, ARG2, NAME, ERR)
    +visit_type_enum(V, NAME, OBJ, ARG1, ARG2, ERR)
    |
    -VISIT_TYPE(V, OBJ, NAME, ERR)
    +VISIT_TYPE(V, NAME, OBJ, ERR)
    )

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1454075341-13658-19-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-02-08 17:29:56 +01:00
Fam Zheng ac987b30d0 block: Use returned *file in bdrv_co_get_block_status
Now that all drivers return the right "file" pointer, we can use it.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1453780743-16806-14-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng e0f100f57c vmdk: Return extent's file in bdrv_get_block_status
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-13-git-send-email-famz@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng d0a18f1025 vmdk: Fix calculation of block status's offset
"offset" is the offset of cluster and sector_num doesn't necessarily
refer to the start of it, it should add index_in_cluster.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-12-git-send-email-famz@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng 7429e20788 vpc: Assign bs->file->bs to file in vpc_co_get_block_status
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-11-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng 8bfb137152 vdi: Assign bs->file->bs to file in vdi_co_get_block_status
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-10-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng d234c92931 sheepdog: Assign bs to file in sd_co_get_block_status
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-9-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng 53f1dfd1ff qed: Assign bs->file->bs to file in bdrv_qed_co_get_block_status
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-8-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng ddf4987d76 parallels: Assign bs->file->bs to file in parallels_co_get_block_status
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-7-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng 3399833f14 iscsi: Assign bs to file in iscsi_co_get_block_status
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-6-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng 02650acbc6 raw: Assign bs to file in raw_co_get_block_status
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-5-git-send-email-famz@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng 178b4db7e5 qcow2: Assign bs->file->bs to file in qcow2_co_get_block_status
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-4-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng 3064bf6fff qcow: Assign bs->file->bs to file in qcow_co_get_block_status
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-3-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng 67a0fd2a9b block: Add "file" output parameter to block status query functions
The added parameter can be used to return the BDS pointer which the
valid offset is referring to. Its value should be ignored unless
BDRV_BLOCK_OFFSET_VALID in ret is set.

Until block drivers fill in the right value, let's clear it explicitly
right before calling .bdrv_get_block_status.

The "bs->file" condition in bdrv_co_get_block_status is kept now to keep iotest
case 102 passing, and will be fixed once all drivers return the right file
pointer.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-2-git-send-email-famz@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Paolo Bonzini 1963f8d52e block: acquire in bdrv_query_image_info
NFS calls aio_poll inside bdrv_get_allocated_size.  This requires
acquiring the AioContext.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1450867706-19860-1-git-send-email-pbonzini@redhat.com
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Max Reitz d8da3cef3b block: Add blk_remove_all_bs()
When bdrv_close_all() is called, instead of force-closing all root
BlockDriverStates, it is better to just drop the reference from all
BlockBackends and let them be closed automatically. This prevents BDS
from getting closed that are still referenced by other BDS, which may
result in loss of cached data.

This patch adds a function for doing that, but does not yet incorporate
it in bdrv_close_all().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-02-02 17:50:46 +01:00
Max Reitz 13855c6b9f block: Use blk_remove_bs() in blk_delete()
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-02-02 17:50:46 +01:00
Max Reitz 033cb5659a block: Remove BDS close notifier
It is unused now, so we can remove it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-02-02 17:50:46 +01:00
Max Reitz 3301f6c6e9 block: Add BB-BDS remove/insert notifiers
bdrv_close() no longer signifies ejection of a medium, this is now done
by removing the BDS from the BB. Therefore, we want to have a notifier
for that in the BB instead of a close notifier in the BDS. The former is
added now, the latter is removed later.

Symmetrically, another notifier list is added that is invoked whenever a
BDS is inserted. We will need that for virtio-blk and virtio-scsi, which
can then remove their op blockers on BDS ejection and set them up on
insertion.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-02-02 17:50:46 +01:00
Fam Zheng 3db1d98a20 vmdk: Fix converting to streamOptimized
Commit d62d9dc4b8 lifted streamOptimized images's version to 3, but we
now refuse to open version 3 images read-write.  We need to make
streamOptimized an exception to allow converting to it. This fixes the
accidentally broken iotests case 059 for the same reason.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:49:34 +01:00
Max Reitz 327032ce74 block/qapi: Emit tray_open only if there is a tray
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 1454096953-31773-5-git-send-email-mreitz@redhat.com
2016-02-02 17:47:06 +01:00
Max Reitz 8f3a73bc57 block: Add blk_dev_has_tray()
Pull out the check whether a block device has a tray from
blk_dev_is_tray_open() into its own function so both attributes (whether
there is a tray vs. whether that tray is open) can be queried
independently.

Cc: qemu-stable <qemu-stable@nongnu.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 1454096953-31773-2-git-send-email-mreitz@redhat.com
2016-02-02 17:46:56 +01:00
Fam Zheng d62d9dc4b8 vmdk: Create streamOptimized as version 3
VMware products accept only version 3 for streamOptimized, let's bump
the version.

Reported-by: Radoslav Gerganov <rgerganov@vmware.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-01-20 13:36:24 +01:00
Kevin Wolf 191fb11bdf qcow2: Make image inaccessible after failed qcow2_invalidate_cache()
If qcow2_invalidate_cache() fails, we are in a state where qcow2_close()
has already been completed, but the image hasn't been reopened yet.
Calling into any qcow2 function for an image in this state will cause
crashes.

The real solution would be to get rid of the close/open pair and instead
do an atomic reset of the involved data structures, but this isn't
trivial, so let's just make the image inaccessible for now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-01-20 13:36:24 +01:00
Kevin Wolf 140fd5a69c qcow2: Fix BDRV_O_INACTIVE handling in qcow2_invalidate_cache()
What qcow2_invalidate_cache() should do is close the image with
BDRV_O_INACTIVE set and reopen it with the flag cleared. In fact, it
used to do exactly the opposite: qcow2_close() relied on bs->open_flags,
which is already updated to have cleared BDRV_O_INACTIVE at this point,
whereas qcow2_open() was called with s->flags, which has the flag still
set. Fix this.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-01-20 13:36:23 +01:00
Kevin Wolf ec6d891224 qcow2: Implement .bdrv_inactivate
The callback has to ensure that closing or flushing the image afterwards
wouldn't cause a write access to the image files. This means that just
the caches have to be written out, which is part of the existing
.bdrv_close implementation.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-01-20 13:36:23 +01:00
Kevin Wolf 04c01a5c8f block: Rename BDRV_O_INCOMING to BDRV_O_INACTIVE
Instead of covering only the state of images on the migration
destination before the migration is completed, the flag will also cover
the state of images on the migration source after completion. This
common state implies that the image is technically still open, but no
writes will happen and any cached contents will be reloaded from disk if
and when the image leaves this state.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-01-20 13:36:23 +01:00
Kevin Wolf 09e0c771e4 block: Assert no write requests under BDRV_O_INCOMING
As long as BDRV_O_INCOMING is set, the image file is only opened so we
have a file descriptor for it. We're definitely not supposed to modify
the image, it's still owned by the migration source.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-01-20 13:36:23 +01:00
Kevin Wolf b527c9b392 qcow2: Write full header on image creation
When creating a qcow2 image, we didn't necessarily call
qcow2_update_header(), but could end up with the basic header that
qcow2_create2() created manually. One thing that this basic header
lacks is the feature table. Let's make sure that it's always present.

This requires a few updates to test cases as well.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-01-20 13:36:23 +01:00
Kevin Wolf 1a4828c793 qcow2: Write feature table only for v3 images
Version 2 images don't have feature bits, so writing a feature table to
those images is kind of pointless.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-01-20 13:36:23 +01:00
Peter Maydell 80c71a241a block: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-01-20 13:36:23 +01:00
Christian Borntraeger 972b543c6b block/raw-posix: avoid bogus fixup for cylinders on DASD disks
large volume DASD that have > 64k cylinders do claim to have
0xFFFE cylinders as special value in the old 16 bit field. We
want to pass this "token" along to the guest, instead of
calculating the real number. Otherwise qemu might fail with
"cyls must be between 1 and 65535"

Cc: qemu-stable@nongnu.org
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-01-19 17:43:55 +01:00
Paolo Bonzini f1c17521e7 nbd-server: do not exit on failed memory allocation
The amount of memory allocated in nbd_co_receive_request is driven by the
NBD client (possibly a virtual machine).  Parallel I/O can cause the
server to allocate a large amount of memory; check for failures and
return ENOMEM in that case.

Cc: qemu-block@nongnu.org
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-15 18:58:02 +01:00
Zhu Lingshan 1cb6d137ff iscsi: send readcapacity10 when readcapacity16 failed
When play with Dell MD3000 target, for sure it
is a TYPE_DISK, but readcapacity16 would fail.
Then we find that readcapacity10 succeeded. It
looks like the target just support readcapacity10
even through it is a TYPE_DISK or have some
TYPE_ROM characteristics.

This patch can give a chance to send
readcapacity16 when readcapacity10 failed.
This patch is not harmful to original pathes

Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Message-Id: <1451359934-9236-1-git-send-email-lszhu@suse.com>
[Don't fall through on UNIT ATTENTION. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-15 18:58:01 +01:00
Markus Armbruster bf89e87427 vhdx: Fix "log that needs to be replayed" error message
The arguments of error_setg_errno() should yield a short error string
without newlines.

Here, we try to append additional help to the error message by
embedding newlines in the error string.  That's nice, but it's doesn't
play nicely with the errno part.  tests/qemu-iotests/070.out shows the
resulting mess:

    can't open device TEST_DIR/iotest-dirtylog-10G-4M.vhdx: VHDX image file 'TEST_DIR/iotest-dirtylog-10G-4M.vhdx' opened read-only, but contains a log that needs to be replayed.  To replay the log, execute:
     qemu-img check -r all 'TEST_DIR/iotest-dirtylog-10G-4M.vhdx': Operation not permitted

Switch to error_setg() and error_append_hint().  Result:

    can't open device TEST_DIR/iotest-dirtylog-10G-4M.vhdx: VHDX image file 'TEST_DIR/iotest-dirtylog-10G-4M.vhdx' opened read-only, but contains a log that needs to be replayed
    To replay the log, run:
    qemu-img check -r all 'TEST_DIR/iotest-dirtylog-10G-4M.vhdx'

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1450452927-8346-21-git-send-email-armbru@redhat.com>
2016-01-13 15:16:18 +01:00
Markus Armbruster d28d737fb9 vmdk: Clean up "Invalid extent lines" error message
vmdk_parse_extents() reports parse errors like this:

    error_setg(errp, "Invalid extent lines:\n%s", p);

where p points to the beginning of the malformed line in the image
descriptor.  This results in a multi-line error message

    Invalid extent lines:
    <first line that doesn't parse>
    <remaining text that may or may not parse, if any>

Error messages should not have newlines embedded.  Since the remaining
text is not helpful, we can simply report:

    Invalid extent line: <first line that doesn't parse>

Cc: Fam Zheng <famz@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1450452927-8346-19-git-send-email-armbru@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-01-13 15:16:18 +01:00
Markus Armbruster e4937694b6 vmdk: Clean up control flow in vmdk_parse_extents() a bit
Factor out loop stepping to turn a while-loop with goto into a
for-loop with continue.

Cc: Fam Zheng <famz@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-Id: <1450452927-8346-18-git-send-email-armbru@redhat.com>
2016-01-13 15:16:18 +01:00
Markus Armbruster 9af9e0fed7 error: Strip trailing '\n' from error string arguments (again)
Commit 6daf194d, be62a2eb and 312fd5f got rid of a bunch, but they
keep coming back.  Tracked down with the Coccinelle semantic patch
from commit 312fd5f.

Cc: Fam Zheng <famz@redhat.com>
Cc: Peter Crosthwaite <crosthwaitepeter@gmail.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Cc: Stefan Berger <stefanb@linux.vnet.ibm.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Changchun Ouyang <changchun.ouyang@intel.com>
Cc: zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Markus Armbruster <armbru@pond.sub.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Acked-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1450452927-8346-17-git-send-email-armbru@redhat.com>
2016-01-13 15:16:18 +01:00
Markus Armbruster e43bfd9c87 error: Use error_prepend() where it makes obvious sense
Done with this Coccinelle semantic patch

    @@
    expression FMT, E1, E2;
    expression list ARGS;
    @@
    -    error_setg(E1, FMT, ARGS, error_get_pretty(E2));
    +    error_propagate(E1, E2);/*###*/
    +    error_prepend(E1, FMT/*@@@*/, ARGS);

followed by manual cleanup, first because I can't figure out how to
make Coccinelle transform strings, and second to get rid of now
superfluous error_propagate().

We now use or propagate the original error whole instead of just its
message obtained with error_get_pretty().  This avoids suppressing its
hint (see commit 50b7b00), but I can't see how the errors touched in
this commit could come with hints.  It also improves the message
printed with &error_abort when we screw up (see commit 1e9b65b).

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-01-13 15:16:17 +01:00
Markus Armbruster c29b77f955 error: Use error_reportf_err() where it makes obvious sense
Done with this Coccinelle semantic patch

    @@
    expression FMT, E, S;
    expression list ARGS;
    @@
    -    error_report(FMT, ARGS, error_get_pretty(E));
    +    error_reportf_err(E, FMT/*@@@*/, ARGS);
    (
    -    error_free(E);
    |
	 exit(S);
    |
	 abort();
    )

followed by a replace of '%s"/*@@@*/' by '"' and some line rewrapping,
because I can't figure out how to make Coccinelle transform strings.

We now use the error whole instead of just its message obtained with
error_get_pretty().  This avoids suppressing its hint (see commit
50b7b00), but I can't see how the errors touched in this commit could
come with hints.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1450452927-8346-12-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-01-13 15:16:17 +01:00
Markus Armbruster 4fffeb5e19 error: Use error_report_err() where appropriate (again)
Same Coccinelle semantic patch as in commit 565f65d.

We now use the original error whole instead of just its message
obtained with error_get_pretty().  This avoids suppressing its hint
(see commit 50b7b00), but I don't think the errors touched in this
commit can come with hints.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1450452927-8346-3-git-send-email-armbru@redhat.com>
2016-01-13 15:16:16 +01:00
Zhu Lingshan 240125bc49 iscsi: fix readcapacity error message
fix:The error message for readcapacity 16 incorrectly mentioned
a readcapacity 10 failure, fixed the error message.

Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-01-11 11:39:28 +03:00
Fam Zheng 0fa296eb00 block/qapi: Clear err for further error
Since a5002d5 (block/qapi: allow best-effort query) we don't return at
this error, however err must be cleared before passing to
bdrv_query_snapshot_info_list below, as required by error API.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1450779107-26765-1-git-send-email-famz@redhat.com
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-01-07 21:30:17 +01:00
Stefan Hajnoczi 3515727f31 block/mirror: replace IOV_MAX with blk_get_max_iov()
Use blk_get_max_iov() instead of hardcoding IOV_MAX, which may not apply
to all BlockDrivers.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-12-22 16:01:07 +08:00
Stefan Hajnoczi 222565f65c block: replace IOV_MAX with BlockLimits.max_iov
Request merging must not result in a huge request that exceeds the
maximum number of iovec elements.  Use BlockLimits.max_iov instead of
hardcoding IOV_MAX.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-12-22 16:01:07 +08:00
Stefan Hajnoczi 648296e067 block-backend: add blk_get_max_iov()
Add a function to query BlockLimits.max_iov.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-12-22 16:01:07 +08:00
Stefan Hajnoczi bd44feb754 block: add BlockLimits.max_iov field
The maximum number of struct iovec elements depends on the
BlockDriverState.  The raw-posix and iSCSI protocols have a maximum of
IOV_MAX but others could have different values.

Cc: Peter Lieven <pl@kamp.de>
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-12-22 16:01:07 +08:00
John Snow a5002d5302 block/qapi: allow best-effort query
For more complex BDS trees that can be created under normal circumstances,
we lose the ability to issue query commands because of our inability to
re-construct the absolute filename.

Instead, omit this field when it is a problem and present as much information
as we can.

This will change the expected output in iotest 110, where we will now see a
json filename and the lack of an absolute filename instead of an error.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1450122916-4706-6-git-send-email-jsnow@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-12-18 14:36:17 +01:00
John Snow 5c9d9ca597 block/qapi: explicitly warn if !has_full_backing_filename
Disambiguate "Backing filename and full backing filename are equivalent"
from "full backing filename could not be determined."

Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1450122916-4706-4-git-send-email-jsnow@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-12-18 14:36:17 +01:00
John Snow 12dcb1c018 block/qapi: always report full_backing_filename
Always report full_backing_filename, even if it's the same as
backing_filename. In the next patch, full_backing_filename may be
omitted if it cannot be generated instead of allowing e.g. drive_query
to abort if it runs into this scenario.

The presence or absence of the "full" field becomes useful information.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1450122916-4706-3-git-send-email-jsnow@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-12-18 14:36:17 +01:00
John Snow 548e1ff379 block/qapi: do not redundantly print "actual path"
If it happens to match the backing path, that was the actual path.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1450122916-4706-2-git-send-email-jsnow@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-12-18 14:36:17 +01:00
Paolo Bonzini ba88944495 block: fix bdrv_ioctl called from coroutine
When called from a coroutine, bdrv_ioctl must be asynchronous just like
e.g. bdrv_flush.  The code was incorrectly making it synchronous, fix
it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-12-18 14:34:44 +01:00
Paolo Bonzini 27a7649a48 block: use drained section around bdrv_snapshot_delete
Do not use bdrv_drain, since by itself it does not guarantee
anything.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-12-18 14:34:44 +01:00
Kevin Wolf d657c0c289 raw-posix: Make aio=native option binding
Traditionally, aio=native was treated as an advice that could simply be
ignored if an error occurs while initialising Linux AIO or the feature
wasn't compiled in. This behaviour was deprecated in commit 96518254
(qemu 2.3; error during init) and commit 1501ecc1 (qemu 2.5; not
compiled in).

This patch changes raw-posix to error out in these cases instead of
printing a deprecation warning.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-12-18 14:34:43 +01:00
Denis V. Lunev b1fc8f934b qcow2: insert assert into qcow2_get_specific_info()
s->qcow_version is always set to 2 or 3. Let's assert if this is wrong.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-12-18 14:34:43 +01:00
Max Reitz 03bb78ed25 qcow2: Point to amend function in check
If a reference count is not representable with the current refcount
order, the image check should point to qemu-img amend for increasing the
refcount order. However, qemu-img amend needs write access to the image
which cannot be provided if the image is marked corrupt; and the image
check will not mark the image consistent unless everything actually is
consistent.

Therefore, if an image is marked corrupt and the image check encounters
a reference count overflow, it cannot be fixed by using qemu-img amend
to increase the refcount order. Instead, one has to use qemu-img convert
to create a completely new copy of the image in this case.

Alternatively, we may want to give the user a way of manually removing
the corrupt flag, maybe through qemu-img amend, but this is not part of
this patch.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-12-18 14:34:43 +01:00
Max Reitz 61ce55fc02 qcow2: Invoke refcount order amendment function
Make use of qcow2_change_refcount_order() to support changing the
refcount order with qemu-img amend.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-12-18 14:34:43 +01:00
Max Reitz 791c9a004e qcow2: Add function for refcount order amendment
Add a function qcow2_change_refcount_order() which allows changing the
refcount order of a qcow2 image.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-12-18 14:34:43 +01:00
Max Reitz c293a80927 qcow2: Use intermediate helper CB for amend
If there is more than one time-consuming operation to be performed for
qcow2_amend_options(), we need an intermediate CB which coordinates the
progress of the individual operations and passes the result to the
original status callback.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-12-18 14:34:43 +01:00
Max Reitz 1038bbb803 qcow2: Split upgrade/downgrade paths for amend
If the image version should be upgraded, that is the first we should do;
if it should be downgraded, that is the last we should do. So split the
version change block into an upgrade part at the start and a downgrade
part at the end.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-12-18 14:34:43 +01:00
Max Reitz 164e0f89cc qcow2: Use abort() instead of assert(false)
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-12-18 14:34:43 +01:00
Max Reitz 29d72431ef qcow2: Use error_report() in qcow2_amend_options()
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-12-18 14:34:43 +01:00
Max Reitz 8b13976d3f block: Add opaque value to the amend CB
Add an opaque value which is to be passed to the bdrv_amend_options()
status callback.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-12-18 14:34:43 +01:00
Kevin Wolf c5e8bfb7cd blkdebug: Enable reopen
Just reopening the children (as block.c does now) is enough.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2015-12-18 14:34:43 +01:00
Kevin Wolf 4cdd01d32e block: Pass driver-specific options to .bdrv_refresh_filename()
In order to decide whether a blkdebug: filename can be produced or a
json: one is necessary, blkdebug checked whether bs->options had more
options than just "config", "x-image" or "image" (the latter including
nested options). That doesn't work well when generic block layer options
are present.

This patch passes an option QDict to the driver that contains only
driver-specific options, i.e. the options for the general block layer as
well as child nodes are already filtered out. Works much better this
way.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2015-12-18 14:34:42 +01:00
Kevin Wolf d9b7b05703 block: Allow references for backing files
For bs->file, using references to existing BDSes has been possible for a
while already. This patch enables the same for bs->backing.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-12-18 14:34:42 +01:00
Kevin Wolf 40365552c2 mirror: Error out when a BDS would get two BBs
bdrv_replace_in_backing_chain() asserts that not both old and new
BlockDdriverState have a BlockBackend attached to them because both
would have to end up pointing to the new BDS and we don't support more
than one BB per BDS yet.

Before we can safely allow references to existing nodes as backing
files, we need to make sure that even if a backing file has a BB on it,
this doesn't crash qemu.

There are probably also some cases with the 'replaces' option set where
drive-mirror could fail this assertion today. They are fixed with this
error check as well.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-12-18 14:34:42 +01:00
Kevin Wolf 5365f44dfa qcow2: Add .bdrv_join_options callback
qcow2 accepts a few driver-specific options that overlap semantically
(e.g. "overlap-check" is an alias of "overlap-check.template", and any
missing cache size option is derived from the given ones).

When bdrv_reopen() merges the set of updated options with left out
options that should be kept at their old value, we need to consider this
and filter out any duplicates (which would generally cause errors
because new and old value would contradict each other).

This patch adds a .bdrv_join_options callback to BlockDriver and
implements it for qcow2.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2015-12-18 14:34:42 +01:00
Eric Blake 1310a3d3bd qobject: Rename qtype_code to QType
The name QType matches our CODING_STYLE conventions for type names
in CamelCase.  It also matches the fact that we are already naming
all the enum members with a prefix of QTYPE, not QTYPE_CODE.  And
doing the rename will also make it easier for the next patch to use
QAPI for providing the enum, which also wants CamelCase type names.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1449033659-25497-3-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-12-17 08:21:28 +01:00
Eric Blake 7fb1cf1606 qapi: Don't let implicit enum MAX member collide
Now that we guarantee the user doesn't have any enum values
beginning with a single underscore, we can use that for our
own purposes.  Renaming ENUM_MAX to ENUM__MAX makes it obvious
that the sentinel is generated.

This patch was mostly generated by applying a temporary patch:

|diff --git a/scripts/qapi.py b/scripts/qapi.py
|index e6d014b..b862ec9 100644
|--- a/scripts/qapi.py
|+++ b/scripts/qapi.py
|@@ -1570,6 +1570,7 @@ const char *const %(c_name)s_lookup[] = {
|     max_index = c_enum_const(name, 'MAX', prefix)
|     ret += mcgen('''
|     [%(max_index)s] = NULL,
|+// %(max_index)s
| };
| ''',
|                max_index=max_index)

then running:

$ cat qapi-{types,event}.c tests/test-qapi-types.c |
    sed -n 's,^// \(.*\)MAX,s|\1MAX|\1_MAX|g,p' > list
$ git grep -l _MAX | xargs sed -i -f list

The only things not generated are the changes in scripts/qapi.py.

Rejecting enum members named 'MAX' is now useless, and will be dropped
in the next patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1447836791-369-23-git-send-email-eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
[Rebased to current master, commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-12-17 08:21:28 +01:00
Eric Blake a31939e6c8 blkdebug: Merge hand-rolled and qapi BlkdebugEvent enum
No need to keep two separate enums, where editing one is likely
to forget the other.  Now that we can specify a qapi enum prefix,
we don't even have to change the bulk of the uses.

get_event_by_name() could perhaps be replaced by qapi_enum_parse(),
but I left that for another day.

CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1447836791-369-20-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-12-17 08:21:27 +01:00
Fam Zheng 61408b250e block: Don't wait serialising for non-COR read requests
The assertion problem was noticed in 06c3916b35, but it wasn't
completely fixed, because even though the req is not marked as
serialising, it still gets serialised by wait_serialising_requests
against other serialising requests, which could lead to the same
assertion failure.

Fix it by even more explicitly skipping the serialising for this
specific case.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1448962590-2842-2-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-12-03 11:08:07 +08:00