Merge tag 'v1.7.0' into xbox17

Conflicts:
	blockdev.c
	hw/audio/ac97.c
This commit is contained in:
espes 2015-06-21 00:56:44 +10:00
commit 282894119a
982 changed files with 55626 additions and 19105 deletions

6
.gitignore vendored
View File

@ -44,8 +44,11 @@ qemu-ga
qemu-bridge-helper
qemu-monitor.texi
vscclient
QMP/qmp-commands.txt
qmp-commands.txt
test-bitops
test-coroutine
test-int128
test-opts-visitor
test-qmp-input-visitor
test-qmp-output-visitor
test-string-input-visitor
@ -79,6 +82,7 @@ fsdev/virtfs-proxy-helper.pod
*.la
*.pc
.libs
.sdk
*.swp
*.orig
.pc

14
.gitmodules vendored
View File

@ -1,27 +1,27 @@
[submodule "roms/vgabios"]
path = roms/vgabios
url = git://git.qemu.org/vgabios.git/
url = git://git.qemu-project.org/vgabios.git/
[submodule "roms/seabios"]
path = roms/seabios
url = git://git.qemu.org/seabios.git/
url = git://git.qemu-project.org/seabios.git/
[submodule "roms/SLOF"]
path = roms/SLOF
url = git://git.qemu.org/SLOF.git
url = git://git.qemu-project.org/SLOF.git
[submodule "roms/ipxe"]
path = roms/ipxe
url = git://git.qemu.org/ipxe.git
url = git://git.qemu-project.org/ipxe.git
[submodule "roms/openbios"]
path = roms/openbios
url = git://git.qemu.org/openbios.git
url = git://git.qemu-project.org/openbios.git
[submodule "roms/qemu-palcode"]
path = roms/qemu-palcode
url = git://github.com/rth7680/qemu-palcode.git
[submodule "roms/sgabios"]
path = roms/sgabios
url = git://git.qemu.org/sgabios.git
url = git://git.qemu-project.org/sgabios.git
[submodule "pixman"]
path = pixman
url = git://anongit.freedesktop.org/pixman
[submodule "dtc"]
path = dtc
url = git://git.qemu.org/dtc.git
url = git://git.qemu-project.org/dtc.git

View File

@ -2,7 +2,8 @@
# into proper addresses so that they are counted properly in git shortlog output.
#
Andrzej Zaborowski <balrogg@gmail.com> balrog <balrog@c046a42c-6fe2-441c-8c8c-71466251a162>
Anthony Liguori <aliguori@us.ibm.com> aliguori <aliguori@c046a42c-6fe2-441c-8c8c-71466251a162>
Anthony Liguori <anthony@codemonkey.ws> aliguori <aliguori@c046a42c-6fe2-441c-8c8c-71466251a162>
Anthony Liguori <anthony@codemonkey.ws> Anthony Liguori <aliguori@us.ibm.com>
Aurelien Jarno <aurelien@aurel32.net> aurel32 <aurel32@c046a42c-6fe2-441c-8c8c-71466251a162>
Blue Swirl <blauwirbel@gmail.com> blueswir1 <blueswir1@c046a42c-6fe2-441c-8c8c-71466251a162>
Edgar E. Iglesias <edgar.iglesias@gmail.com> edgar_igl <edgar_igl@c046a42c-6fe2-441c-8c8c-71466251a162>

71
.travis.yml Normal file
View File

@ -0,0 +1,71 @@
language: c
python:
- "2.4"
compiler:
- gcc
- clang
env:
global:
- TEST_CMD="make check"
- EXTRA_CONFIG=""
# Development packages, EXTRA_PKGS saved for additional builds
- CORE_PKGS="libusb-1.0-0-dev libiscsi-dev librados-dev libncurses5-dev"
- NET_PKGS="libseccomp-dev libgnutls-dev libssh2-1-dev libspice-server-dev libspice-protocol-dev libnss3-dev"
- GUI_PKGS="libgtk-3-dev libvte-2.90-dev libsdl1.2-dev libpng12-dev libpixman-1-dev"
- EXTRA_PKGS=""
matrix:
- TARGETS=alpha-softmmu,alpha-linux-user
- TARGETS=arm-softmmu,arm-linux-user
- TARGETS=cris-softmmu
- TARGETS=i386-softmmu,x86_64-softmmu
- TARGETS=lm32-softmmu
- TARGETS=m68k-softmmu
- TARGETS=microblaze-softmmu,microblazeel-softmmu
- TARGETS=mips-softmmu,mips64-softmmu,mips64el-softmmu,mipsel-softmmu
- TARGETS=moxie-softmmu
- TARGETS=or32-softmmu,
- TARGETS=ppc-softmmu,ppc64-softmmu,ppcemb-softmmu
- TARGETS=s390x-softmmu
- TARGETS=sh4-softmmu,sh4eb-softmmu
- TARGETS=sparc-softmmu,sparc64-softmmu
- TARGETS=unicore32-softmmu
- TARGETS=xtensa-softmmu,xtensaeb-softmmu
before_install:
- git submodule update --init --recursive
- sudo apt-get update -qq
- sudo apt-get install -qq ${CORE_PKGS} ${NET_PKGS} ${GUI_PKGS} ${EXTRA_PKGS}
script: "./configure --target-list=${TARGETS} ${EXTRA_CONFIG} && make && ${TEST_CMD}"
matrix:
# We manually include a number of additional build for non-standard bits
include:
# Debug related options
- env: TARGETS=i386-softmmu,x86_64-softmmu
EXTRA_CONFIG="--enable-debug"
compiler: gcc
- env: TARGETS=i386-softmmu,x86_64-softmmu
EXTRA_CONFIG="--enable-debug --enable-tcg-interpreter"
compiler: gcc
# Currently configure doesn't force --disable-pie
- env: TARGETS=i386-softmmu,x86_64-softmmu
EXTRA_CONFIG="--enable-gprof --enable-gcov --disable-pie"
compiler: gcc
- env: TARGETS=i386-softmmu,x86_64-softmmu
EXTRA_PKGS="sparse"
EXTRA_CONFIG="--enable-sparse"
compiler: gcc
# All the trace backends (apart from dtrace)
- env: TARGETS=i386-softmmu,x86_64-softmmu
EXTRA_CONFIG="--enable-trace-backend=stderr"
compiler: gcc
- env: TARGETS=i386-softmmu,x86_64-softmmu
EXTRA_CONFIG="--enable-trace-backend=simple"
compiler: gcc
- env: TARGETS=i386-softmmu,x86_64-softmmu
EXTRA_CONFIG="--enable-trace-backend=ftrace"
TEST_CMD=""
compiler: gcc
# This disabled make check for the ftrace backend which needs more setting up
# Currently broken on 12.04 due to mis-packaged liburcu and changed API, will be pulled.
#- env: TARGETS=i386-softmmu,x86_64-softmmu
# EXTRA_PKGS="liblttng-ust-dev liburcu-dev"
# EXTRA_CONFIG="--enable-trace-backend=ust"

View File

@ -1,6 +1,6 @@
This file documents changes for QEMU releases 0.12 and earlier.
For changelog information for later releases, see
http://wiki.qemu.org/ChangeLog or look at the git history for
http://wiki.qemu-project.org/ChangeLog or look at the git history for
more detailed information.

View File

@ -50,8 +50,7 @@ Descriptions of section entries:
General Project Administration
------------------------------
M: Anthony Liguori <aliguori@us.ibm.com>
M: Paul Brook <paul@codesourcery.com>
M: Anthony Liguori <aliguori@amazon.com>
Guest CPU cores (TCG):
----------------------
@ -62,7 +61,6 @@ F: target-alpha/
F: hw/alpha/
ARM
M: Paul Brook <paul@codesourcery.com>
M: Peter Maydell <peter.maydell@linaro.org>
S: Maintained
F: target-arm/
@ -83,8 +81,7 @@ F: hw/lm32/
F: hw/char/lm32_*
M68K
M: Paul Brook <paul@codesourcery.com>
S: Odd Fixes
S: Orphan
F: target-m68k/
F: hw/m68k/
@ -248,7 +245,6 @@ F: hw/*/imx*
F: hw/arm/kzm.c
Integrator CP
M: Paul Brook <paul@codesourcery.com>
M: Peter Maydell <peter.maydell@linaro.org>
S: Maintained
F: hw/arm/integratorcp.c
@ -274,7 +270,6 @@ S: Maintained
F: hw/arm/palm.c
Real View
M: Paul Brook <paul@codesourcery.com>
M: Peter Maydell <peter.maydell@linaro.org>
S: Maintained
F: hw/arm/realview*
@ -285,13 +280,11 @@ S: Maintained
F: hw/arm/spitz.c
Stellaris
M: Paul Brook <paul@codesourcery.com>
M: Peter Maydell <peter.maydell@linaro.org>
S: Maintained
F: hw/*/stellaris*
Versatile PB
M: Paul Brook <paul@codesourcery.com>
M: Peter Maydell <peter.maydell@linaro.org>
S: Maintained
F: hw/*/versatile*
@ -327,18 +320,15 @@ F: hw/lm32/milkymist.c
M68K Machines
-------------
an5206
M: Paul Brook <paul@codesourcery.com>
S: Maintained
S: Orphan
F: hw/m68k/an5206.c
dummy_m68k
M: Paul Brook <paul@codesourcery.com>
S: Maintained
S: Orphan
F: hw/m68k/dummy_m68k.c
mcf5208
M: Paul Brook <paul@codesourcery.com>
S: Maintained
S: Orphan
F: hw/m68k/mcf5208.c
MicroBlaze Machines
@ -509,7 +499,7 @@ F: hw/unicore32/
X86 Machines
------------
PC
M: Anthony Liguori <aliguori@us.ibm.com>
M: Anthony Liguori <aliguori@amazon.com>
S: Supported
F: hw/i386/pc.[ch]
F: hw/i386/pc_piix.c
@ -567,8 +557,7 @@ F: hw/scsi/*
T: git git://github.com/bonzini/qemu.git scsi-next
LSI53C895A
M: Paul Brook <paul@codesourcery.com>
S: Odd Fixes
S: Orphan
F: hw/scsi/lsi53c895a.c
SSI
@ -593,7 +582,7 @@ S: Supported
F: hw/*/*vhost*
virtio
M: Anthony Liguori <aliguori@us.ibm.com>
M: Anthony Liguori <aliguori@amazon.com>
S: Supported
F: hw/*/virtio*
@ -638,6 +627,7 @@ Subsystems
----------
Audio
M: Vassili Karpov (malc) <av1474@comtv.ru>
M: Gerd Hoffmann <kraxel@redhat.com>
S: Maintained
F: audio/
F: hw/audio/
@ -649,9 +639,11 @@ S: Supported
F: block*
F: block/
F: hw/block/
T: git git://repo.or.cz/qemu/kevin.git block
T: git git://github.com/stefanha/qemu.git block
Character Devices
M: Anthony Liguori <aliguori@us.ibm.com>
M: Anthony Liguori <aliguori@amazon.com>
S: Maintained
F: qemu-char.c
@ -689,7 +681,7 @@ F: audio/spiceaudio.c
F: hw/display/qxl*
Graphics
M: Anthony Liguori <aliguori@us.ibm.com>
M: Anthony Liguori <aliguori@amazon.com>
S: Maintained
F: ui/
@ -699,7 +691,7 @@ S: Odd Fixes
F: ui/cocoa.m
Main loop
M: Anthony Liguori <aliguori@us.ibm.com>
M: Anthony Liguori <aliguori@amazon.com>
S: Supported
F: vl.c
@ -709,9 +701,10 @@ S: Supported
F: monitor.c
F: hmp.c
F: hmp-commands.hx
T: git git://repo.or.cz/qemu/qmp-unstable.git queue/qmp
Network device layer
M: Anthony Liguori <aliguori@us.ibm.com>
M: Anthony Liguori <aliguori@amazon.com>
M: Stefan Hajnoczi <stefanha@redhat.com>
S: Maintained
F: net/
@ -730,6 +723,7 @@ M: Luiz Capitulino <lcapitulino@redhat.com>
M: Michael Roth <mdroth@linux.vnet.ibm.com>
S: Supported
F: qapi/
T: git git://repo.or.cz/qemu/qmp-unstable.git queue/qmp
QAPI Schema
M: Eric Blake <eblake@redhat.com>
@ -737,6 +731,7 @@ M: Luiz Capitulino <lcapitulino@redhat.com>
M: Markus Armbruster <armbru@redhat.com>
S: Supported
F: qapi-schema.json
T: git git://repo.or.cz/qemu/qmp-unstable.git queue/qmp
QMP
M: Luiz Capitulino <lcapitulino@redhat.com>
@ -745,6 +740,7 @@ F: qmp.c
F: monitor.c
F: qmp-commands.hx
F: QMP/
T: git git://repo.or.cz/qemu/qmp-unstable.git queue/qmp
SLIRP
M: Jan Kiszka <jan.kiszka@siemens.com>
@ -766,6 +762,12 @@ M: Blue Swirl <blauwirbel@gmail.com>
S: Odd Fixes
F: scripts/checkpatch.pl
Seccomp
M: Eduardo Otubo <otubo@linux.vnet.ibm.com>
S: Supported
F: qemu-seccomp.c
F: include/sysemu/seccomp.h
Usermode Emulation
------------------
BSD user
@ -797,11 +799,6 @@ M: Andrzej Zaborowski <balrogg@gmail.com>
S: Maintained
F: tcg/arm/
HPPA target
M: Richard Henderson <rth@twiddle.net>
S: Maintained
F: tcg/hppa/
i386 target
M: qemu-devel@nongnu.org
S: Maintained
@ -842,25 +839,67 @@ TCI target
M: Stefan Weil <sw@weilnetz.de>
S: Maintained
F: tcg/tci/
F: tci.c
Stable branches
---------------
Stable 1.0
L: qemu-stable@nongnu.org
T: git git://git.qemu.org/qemu-stable-1.0.git
T: git git://git.qemu-project.org/qemu-stable-1.0.git
S: Orphan
Stable 0.15
L: qemu-stable@nongnu.org
T: git git://git.qemu.org/qemu-stable-0.15.git
S: Orphan
M: Andreas Färber <afaerber@suse.de>
T: git git://git.qemu-project.org/qemu-stable-0.15.git
S: Supported
Stable 0.14
L: qemu-stable@nongnu.org
T: git git://git.qemu.org/qemu-stable-0.14.git
T: git git://git.qemu-project.org/qemu-stable-0.14.git
S: Orphan
Stable 0.10
L: qemu-stable@nongnu.org
T: git git://git.qemu.org/qemu-stable-0.10.git
T: git git://git.qemu-project.org/qemu-stable-0.10.git
S: Orphan
Block drivers
-------------
VMDK
M: Fam Zheng <famz@redhat.com>
S: Supported
F: block/vmdk.c
RBD
M: Josh Durgin <josh.durgin@inktank.com>
S: Supported
F: block/rbd.c
Sheepdog
M: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
M: Liu Yuan <namei.unix@gmail.com>
S: Supported
F: block/sheepdog.c
VHDX
M: Jeff Cody <jcody@redhat.com>
S: Supported
F: block/vhdx*
VDI
M: Stefan Weil <sw@weilnetz.de>
S: Maintained
F: block/vdi.c
iSCSI
M: Ronnie Sahlberg <ronniesahlberg@gmail.com>
M: Paolo Bonzini <pbonzini@redhat.com>
M: Peter Lieven <pl@kamp.de>
S: Supported
F: block/iscsi.c
SSH
M: Richard W.M. Jones <rjones@redhat.com>
S: Supported
F: block/ssh.c

View File

@ -28,7 +28,14 @@ CONFIG_ALL=y
include $(SRC_PATH)/rules.mak
config-host.mak: $(SRC_PATH)/configure
@echo $@ is out-of-date, running configure
@sed -n "/.*Configured with/s/[^:]*: //p" $@ | sh
@# TODO: The next lines include code which supports a smooth
@# transition from old configurations without config.status.
@# This code can be removed after QEMU 1.7.
@if test -x config.status; then \
./config.status; \
else \
sed -n "/.*Configured with/s/[^:]*: //p" $@ | sh; \
fi
else
config-host.mak:
ifneq ($(filter-out %clean,$(MAKECMDGOALS)),$(if $(MAKECMDGOALS),,fail))
@ -65,7 +72,7 @@ LIBS+=-lz $(LIBS_TOOLS)
HELPERS-$(CONFIG_LINUX) = qemu-bridge-helper$(EXESUF)
ifdef BUILD_DOCS
DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8 QMP/qmp-commands.txt
DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8 qmp-commands.txt
ifdef CONFIG_VIRTFS
DOCS+=fsdev/virtfs-proxy-helper.1
endif
@ -168,7 +175,9 @@ recurse-all: $(SUBDIR_RULES) $(ROMSUBDIR_RULES)
bt-host.o: QEMU_CFLAGS += $(BLUEZ_CFLAGS)
$(BUILD_DIR)/version.o: $(SRC_PATH)/version.rc $(BUILD_DIR)/config-host.h | $(BUILD_DIR)/version.lo
$(call quiet-command,$(WINDRES) -I$(BUILD_DIR) -o $@ $<," RC version.o")
$(BUILD_DIR)/version.lo: $(SRC_PATH)/version.rc $(BUILD_DIR)/config-host.h
$(call quiet-command,$(WINDRES) -I$(BUILD_DIR) -o $@ $<," RC version.lo")
Makefile: $(version-obj-y) $(version-lobj-y)
@ -233,8 +242,9 @@ clean:
rm -f qemu-options.def
find . -name '*.[oda]' -type f -exec rm -f {} +
find . -name '*.l[oa]' -type f -exec rm -f {} +
rm -f $(TOOLS) $(HELPERS-y) qemu-ga TAGS cscope.* *.pod *~ */*~
rm -Rf .libs
rm -f $(filter-out %.tlb,$(TOOLS)) $(HELPERS-y) qemu-ga TAGS cscope.* *.pod *~ */*~
rm -f fsdev/*.pod
rm -rf .libs */.libs
rm -f qemu-img-cmds.h
@# May not be present in GENERATED_HEADERS
rm -f trace/generated-tracers-dtrace.dtrace*
@ -243,7 +253,6 @@ clean:
rm -f $(foreach f,$(GENERATED_SOURCES),$(f) $(f)-timestamp)
rm -rf qapi-generated
rm -rf qga/qapi-generated
$(MAKE) -C tests/tcg clean
for d in $(ALL_SUBDIRS); do \
if test -d $$d; then $(MAKE) -C $$d $@ || exit 1; fi; \
rm -f $$d/qemu-options.def; \
@ -259,6 +268,7 @@ qemu-%.tar.bz2:
distclean: clean
rm -f config-host.mak config-host.h* config-host.ld $(DOCS) qemu-options.texi qemu-img-cmds.texi qemu-monitor.texi
rm -f config-all-devices.mak config-all-disas.mak
rm -f po/*.mo
rm -f roms/seabios/config.mak roms/vgabios/config.mak
rm -f qemu-doc.info qemu-doc.aux qemu-doc.cp qemu-doc.cps qemu-doc.dvi
rm -f qemu-doc.fn qemu-doc.fns qemu-doc.info qemu-doc.ky qemu-doc.kys
@ -270,19 +280,20 @@ distclean: clean
for d in $(TARGET_DIRS); do \
rm -rf $$d || exit 1 ; \
done
rm -Rf .sdk
if test -f pixman/config.log; then make -C pixman distclean; fi
if test -f dtc/version_gen.h; then make $(DTC_MAKE_ARGS) clean; fi
KEYMAPS=da en-gb et fr fr-ch is lt modifiers no pt-br sv \
ar de en-us fi fr-be hr it lv nl pl ru th \
common de-ch es fo fr-ca hu ja mk nl-be pt sl tr \
bepo
bepo cz
ifdef INSTALL_BLOBS
BLOBS=bios.bin sgabios.bin vgabios.bin vgabios-cirrus.bin \
vgabios-stdvga.bin vgabios-vmware.bin vgabios-qxl.bin \
acpi-dsdt.aml q35-acpi-dsdt.aml \
ppc_rom.bin openbios-sparc32 openbios-sparc64 openbios-ppc \
ppc_rom.bin openbios-sparc32 openbios-sparc64 openbios-ppc QEMU,tcx.bin \
pxe-e1000.rom pxe-eepro100.rom pxe-ne2k_pci.rom \
pxe-pcnet.rom pxe-rtl8139.rom pxe-virtio.rom \
efi-e1000.rom efi-eepro100.rom efi-ne2k_pci.rom \
@ -301,7 +312,7 @@ endif
install-doc: $(DOCS)
$(INSTALL_DIR) "$(DESTDIR)$(qemu_docdir)"
$(INSTALL_DATA) qemu-doc.html qemu-tech.html "$(DESTDIR)$(qemu_docdir)"
$(INSTALL_DATA) QMP/qmp-commands.txt "$(DESTDIR)$(qemu_docdir)"
$(INSTALL_DATA) qmp-commands.txt "$(DESTDIR)$(qemu_docdir)"
ifdef CONFIG_POSIX
$(INSTALL_DIR) "$(DESTDIR)$(mandir)/man1"
$(INSTALL_DATA) qemu.1 "$(DESTDIR)$(mandir)/man1"
@ -395,7 +406,7 @@ qemu-options.texi: $(SRC_PATH)/qemu-options.hx
qemu-monitor.texi: $(SRC_PATH)/hmp-commands.hx
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -t < $< > $@," GEN $@")
QMP/qmp-commands.txt: $(SRC_PATH)/qmp-commands.hx
qmp-commands.txt: $(SRC_PATH)/qmp-commands.hx
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -q < $< > $@," GEN $@")
qemu-img-cmds.texi: $(SRC_PATH)/qemu-img-cmds.hx

View File

@ -109,6 +109,7 @@ version-lobj-$(CONFIG_WIN32) += $(BUILD_DIR)/version.lo
# FIXME: a few definitions from qapi-types.o/qapi-visit.o are needed
# by libqemuutil.a. These should be moved to a separate .json schema.
qga-obj-y = qga/ qapi-types.o qapi-visit.o
qga-vss-dll-obj-y = qga/
vl.o: QEMU_CFLAGS+=$(GPROF_CFLAGS)
@ -120,6 +121,7 @@ nested-vars += \
stub-obj-y \
util-obj-y \
qga-obj-y \
qga-vss-dll-obj-y \
block-obj-y \
common-obj-y
dummy := $(call unnest-vars)

View File

@ -70,10 +70,6 @@ all: $(PROGS) stap
# Dummy command so that make thinks it has done something
@true
CONFIG_NO_PCI = $(if $(subst n,,$(CONFIG_PCI)),n,y)
CONFIG_NO_KVM = $(if $(subst n,,$(CONFIG_KVM)),n,y)
CONFIG_NO_XEN = $(if $(subst n,,$(CONFIG_XEN)),n,y)
#########################################################
# cpu emulator library
obj-y = exec.o translate-all.o cpu-exec.o
@ -83,8 +79,8 @@ obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o
obj-y += fpu/softfloat.o
obj-y += target-$(TARGET_BASE_ARCH)/
obj-y += disas.o
obj-$(CONFIG_GDBSTUB_XML) += gdbstub-xml.o
obj-$(CONFIG_NO_KVM) += kvm-stub.o
obj-$(call notempty,$(TARGET_XML_FILES)) += gdbstub-xml.o
obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
#########################################################
# Linux user emulator target
@ -125,7 +121,7 @@ LIBS+=$(libs_softmmu)
# xen support
obj-$(CONFIG_XEN) += xen-all.o xen-mapcache.o
obj-$(CONFIG_NO_XEN) += xen-stub.o
obj-$(call lnot,$(CONFIG_XEN)) += xen-stub.o
# Hardware support
ifeq ($(TARGET_NAME), sparc64)

View File

@ -1,88 +0,0 @@
QEMU Monitor Protocol
=====================
Introduction
-------------
The QEMU Monitor Protocol (QMP) allows applications to communicate with
QEMU's Monitor.
QMP is JSON[1] based and currently has the following features:
- Lightweight, text-based, easy to parse data format
- Asynchronous messages support (ie. events)
- Capabilities Negotiation
For detailed information on QMP's usage, please, refer to the following files:
o qmp-spec.txt QEMU Monitor Protocol current specification
o qmp-commands.txt QMP supported commands (auto-generated at build-time)
o qmp-events.txt List of available asynchronous events
There is also a simple Python script called 'qmp-shell' available.
IMPORTANT: It's strongly recommended to read the 'Stability Considerations'
section in the qmp-commands.txt file before making any serious use of QMP.
[1] http://www.json.org
Usage
-----
To enable QMP, you need a QEMU monitor instance in "control mode". There are
two ways of doing this.
The simplest one is using the '-qmp' command-line option. The following
example makes QMP available on localhost port 4444:
$ qemu [...] -qmp tcp:localhost:4444,server
However, in order to have more complex combinations, like multiple monitors,
the '-mon' command-line option should be used along with the '-chardev' one.
For instance, the following example creates one user monitor on stdio and one
QMP monitor on localhost port 4444.
$ qemu [...] -chardev stdio,id=mon0 -mon chardev=mon0,mode=readline \
-chardev socket,id=mon1,host=localhost,port=4444,server \
-mon chardev=mon1,mode=control
Please, refer to QEMU's manpage for more information.
Simple Testing
--------------
To manually test QMP one can connect with telnet and issue commands by hand:
$ telnet localhost 4444
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 13, "major": 0}, "package": ""}, "capabilities": []}}
{ "execute": "qmp_capabilities" }
{"return": {}}
{ "execute": "query-version" }
{"return": {"qemu": {"micro": 50, "minor": 13, "major": 0}, "package": ""}}
Development Process
-------------------
When changing QMP's interface (by adding new commands, events or modifying
existing ones) it's mandatory to update the relevant documentation, which is
one (or more) of the files listed in the 'Introduction' section*.
Also, it's strongly recommended to send the documentation patch first, before
doing any code change. This is so because:
1. Avoids the code dictating the interface
2. Review can improve your interface. Letting that happen before
you implement it can save you work.
* The qmp-commands.txt file is generated from the qmp-commands.hx one, which
is the file that should be edited.
Homepage
--------
http://wiki.qemu.org/QMP

2
README
View File

@ -1,3 +1,3 @@
Read the documentation in qemu-doc.html or on http://wiki.qemu.org
Read the documentation in qemu-doc.html or on http://wiki.qemu-project.org
- QEMU team

View File

@ -1 +1 @@
1.6.0
1.7.0

View File

@ -23,7 +23,6 @@ struct AioHandler
GPollFD pfd;
IOHandler *io_read;
IOHandler *io_write;
AioFlushHandler *io_flush;
int deleted;
int pollfds_idx;
void *opaque;
@ -47,7 +46,6 @@ void aio_set_fd_handler(AioContext *ctx,
int fd,
IOHandler *io_read,
IOHandler *io_write,
AioFlushHandler *io_flush,
void *opaque)
{
AioHandler *node;
@ -84,7 +82,6 @@ void aio_set_fd_handler(AioContext *ctx,
/* Update handler with latest information */
node->io_read = io_read;
node->io_write = io_write;
node->io_flush = io_flush;
node->opaque = opaque;
node->pollfds_idx = -1;
@ -97,12 +94,10 @@ void aio_set_fd_handler(AioContext *ctx,
void aio_set_event_notifier(AioContext *ctx,
EventNotifier *notifier,
EventNotifierHandler *io_read,
AioFlushEventNotifierHandler *io_flush)
EventNotifierHandler *io_read)
{
aio_set_fd_handler(ctx, event_notifier_get_fd(notifier),
(IOHandler *)io_read, NULL,
(AioFlushHandler *)io_flush, notifier);
(IOHandler *)io_read, NULL, notifier);
}
bool aio_pending(AioContext *ctx)
@ -147,7 +142,11 @@ static bool aio_dispatch(AioContext *ctx)
(revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) &&
node->io_read) {
node->io_read(node->opaque);
progress = true;
/* aio_notify() does not count as progress */
if (node->opaque != &ctx->notifier) {
progress = true;
}
}
if (!node->deleted &&
(revents & (G_IO_OUT | G_IO_ERR)) &&
@ -166,6 +165,10 @@ static bool aio_dispatch(AioContext *ctx)
g_free(tmp);
}
}
/* Run our timers */
progress |= timerlistgroup_run_timers(&ctx->tlg);
return progress;
}
@ -173,7 +176,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
{
AioHandler *node;
int ret;
bool busy, progress;
bool progress;
progress = false;
@ -200,20 +203,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
g_array_set_size(ctx->pollfds, 0);
/* fill pollfds */
busy = false;
QLIST_FOREACH(node, &ctx->aio_handlers, node) {
node->pollfds_idx = -1;
/* If there aren't pending AIO operations, don't invoke callbacks.
* Otherwise, if there are no AIO requests, qemu_aio_wait() would
* wait indefinitely.
*/
if (!node->deleted && node->io_flush) {
if (node->io_flush(node->opaque) == 0) {
continue;
}
busy = true;
}
if (!node->deleted && node->pfd.events) {
GPollFD pfd = {
.fd = node->pfd.fd,
@ -226,15 +217,15 @@ bool aio_poll(AioContext *ctx, bool blocking)
ctx->walking_handlers--;
/* No AIO operations? Get us out of here */
if (!busy) {
/* early return if we only have the aio_notify() fd */
if (ctx->pollfds->len == 1) {
return progress;
}
/* wait until next event */
ret = g_poll((GPollFD *)ctx->pollfds->data,
ctx->pollfds->len,
blocking ? -1 : 0);
ret = qemu_poll_ns((GPollFD *)ctx->pollfds->data,
ctx->pollfds->len,
blocking ? timerlistgroup_deadline_ns(&ctx->tlg) : 0);
/* if we have any readable fds, dispatch event */
if (ret > 0) {
@ -245,11 +236,12 @@ bool aio_poll(AioContext *ctx, bool blocking)
node->pfd.revents = pfd->revents;
}
}
if (aio_dispatch(ctx)) {
progress = true;
}
}
assert(progress || busy);
return true;
/* Run dispatch even if there were no readable fds to run timers */
if (aio_dispatch(ctx)) {
progress = true;
}
return progress;
}

View File

@ -23,7 +23,6 @@
struct AioHandler {
EventNotifier *e;
EventNotifierHandler *io_notify;
AioFlushEventNotifierHandler *io_flush;
GPollFD pfd;
int deleted;
QLIST_ENTRY(AioHandler) node;
@ -31,8 +30,7 @@ struct AioHandler {
void aio_set_event_notifier(AioContext *ctx,
EventNotifier *e,
EventNotifierHandler *io_notify,
AioFlushEventNotifierHandler *io_flush)
EventNotifierHandler *io_notify)
{
AioHandler *node;
@ -73,7 +71,6 @@ void aio_set_event_notifier(AioContext *ctx,
}
/* Update handler with latest information */
node->io_notify = io_notify;
node->io_flush = io_flush;
}
aio_notify(ctx);
@ -96,8 +93,9 @@ bool aio_poll(AioContext *ctx, bool blocking)
{
AioHandler *node;
HANDLE events[MAXIMUM_WAIT_OBJECTS + 1];
bool busy, progress;
bool progress;
int count;
int timeout;
progress = false;
@ -111,6 +109,9 @@ bool aio_poll(AioContext *ctx, bool blocking)
progress = true;
}
/* Run timers */
progress |= timerlistgroup_run_timers(&ctx->tlg);
/*
* Then dispatch any pending callbacks from the GSource.
*
@ -126,7 +127,11 @@ bool aio_poll(AioContext *ctx, bool blocking)
if (node->pfd.revents && node->io_notify) {
node->pfd.revents = 0;
node->io_notify(node->e);
progress = true;
/* aio_notify() does not count as progress */
if (node->e != &ctx->notifier) {
progress = true;
}
}
tmp = node;
@ -147,19 +152,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
ctx->walking_handlers++;
/* fill fd sets */
busy = false;
count = 0;
QLIST_FOREACH(node, &ctx->aio_handlers, node) {
/* If there aren't pending AIO operations, don't invoke callbacks.
* Otherwise, if there are no AIO requests, qemu_aio_wait() would
* wait indefinitely.
*/
if (!node->deleted && node->io_flush) {
if (node->io_flush(node->e) == 0) {
continue;
}
busy = true;
}
if (!node->deleted && node->io_notify) {
events[count++] = event_notifier_get_handle(node->e);
}
@ -167,15 +161,18 @@ bool aio_poll(AioContext *ctx, bool blocking)
ctx->walking_handlers--;
/* No AIO operations? Get us out of here */
if (!busy) {
/* early return if we only have the aio_notify() fd */
if (count == 1) {
return progress;
}
/* wait until next event */
while (count > 0) {
int timeout = blocking ? INFINITE : 0;
int ret = WaitForMultipleObjects(count, events, FALSE, timeout);
int ret;
timeout = blocking ?
qemu_timeout_ns_to_ms(timerlistgroup_deadline_ns(&ctx->tlg)) : 0;
ret = WaitForMultipleObjects(count, events, FALSE, timeout);
/* if we have any signaled events, dispatch event */
if ((DWORD) (ret - WAIT_OBJECT_0) >= count) {
@ -196,7 +193,11 @@ bool aio_poll(AioContext *ctx, bool blocking)
event_notifier_get_handle(node->e) == events[ret - WAIT_OBJECT_0] &&
node->io_notify) {
node->io_notify(node->e);
progress = true;
/* aio_notify() does not count as progress */
if (node->e != &ctx->notifier) {
progress = true;
}
}
tmp = node;
@ -214,6 +215,14 @@ bool aio_poll(AioContext *ctx, bool blocking)
events[ret - WAIT_OBJECT_0] = events[--count];
}
assert(progress || busy);
return true;
if (blocking) {
/* Run the timers a second time. We do this because otherwise aio_wait
* will not note progress - and will stop a drain early - if we have
* a timer that was not ready to run entering g_poll but is ready
* after g_poll. This will only do anything if a timer has expired.
*/
progress |= timerlistgroup_run_timers(&ctx->tlg);
}
return progress;
}

View File

@ -150,10 +150,9 @@ int qemu_read_default_config_files(bool userconfig)
return 0;
}
static inline bool is_zero_page(uint8_t *p)
static inline bool is_zero_range(uint8_t *p, uint64_t size)
{
return buffer_find_nonzero_offset(p, TARGET_PAGE_SIZE) ==
TARGET_PAGE_SIZE;
return buffer_find_nonzero_offset(p, size) == size;
}
/* struct contains XBZRLE cache and a static page
@ -342,7 +341,8 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
{
unsigned long base = mr->ram_addr >> TARGET_PAGE_BITS;
unsigned long nr = base + (start >> TARGET_PAGE_BITS);
unsigned long size = base + (int128_get64(mr->size) >> TARGET_PAGE_BITS);
uint64_t mr_size = TARGET_PAGE_ALIGN(memory_region_size(mr));
unsigned long size = base + (mr_size >> TARGET_PAGE_BITS);
unsigned long next;
@ -392,7 +392,7 @@ static void migration_bitmap_sync(void)
}
if (!start_time) {
start_time = qemu_get_clock_ms(rt_clock);
start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
}
trace_migration_bitmap_sync_start();
@ -410,7 +410,7 @@ static void migration_bitmap_sync(void)
trace_migration_bitmap_sync_end(migration_dirty_pages
- num_dirty_pages_init);
num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init;
end_time = qemu_get_clock_ms(rt_clock);
end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
/* more than 1 second = 1000 millisecons */
if (end_time > start_time + 1000) {
@ -496,7 +496,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
acct_info.dup_pages++;
}
}
} else if (is_zero_page(p)) {
} else if (is_zero_range(p, TARGET_PAGE_SIZE)) {
acct_info.dup_pages++;
bytes_sent = save_block_hdr(f, block, offset, cont,
RAM_SAVE_FLAG_COMPRESS);
@ -672,7 +672,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
ram_control_before_iterate(f, RAM_CONTROL_ROUND);
t0 = qemu_get_clock_ns(rt_clock);
t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
i = 0;
while ((ret = qemu_file_rate_limit(f)) == 0) {
int bytes_sent;
@ -691,7 +691,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
iterations
*/
if ((i & 63) == 0) {
uint64_t t1 = (qemu_get_clock_ns(rt_clock) - t0) / 1000000;
uint64_t t1 = (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - t0) / 1000000;
if (t1 > MAX_WAIT) {
DPRINTF("big wait: %" PRIu64 " milliseconds, %d iterations\n",
t1, i);
@ -709,15 +709,20 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
*/
ram_control_after_iterate(f, RAM_CONTROL_ROUND);
bytes_transferred += total_sent;
/*
* Do not count these 8 bytes into total_sent, so that we can
* return 0 if no page had been dirtied.
*/
qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
bytes_transferred += 8;
ret = qemu_file_get_error(f);
if (ret < 0) {
bytes_transferred += total_sent;
return ret;
}
qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
total_sent += 8;
bytes_transferred += total_sent;
return total_sent;
}
@ -843,15 +848,8 @@ static inline void *host_from_stream_offset(QEMUFile *f,
*/
void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
{
if (ch != 0 || !is_zero_page(host)) {
if (ch != 0 || !is_zero_range(host, size)) {
memset(host, ch, size);
#ifndef _WIN32
if (ch == 0 &&
(!kvm_enabled() || kvm_has_sync_mmu()) &&
getpagesize() <= TARGET_PAGE_SIZE) {
qemu_madvise(host, TARGET_PAGE_SIZE, QEMU_MADV_DONTNEED);
}
#endif
}
}
@ -1112,9 +1110,6 @@ int qemu_uuid_parse(const char *str, uint8_t *uuid)
if (ret != 16) {
return -1;
}
#ifdef TARGET_I386
smbios_add_field(1, offsetof(struct smbios_type_1, uuid), uuid, 16);
#endif
return 0;
}
@ -1125,20 +1120,18 @@ void do_acpitable_option(const QemuOpts *opts)
acpi_table_add(opts, &err);
if (err) {
fprintf(stderr, "Wrong acpi table provided: %s\n",
error_get_pretty(err));
error_report("Wrong acpi table provided: %s",
error_get_pretty(err));
error_free(err);
exit(1);
}
#endif
}
void do_smbios_option(const char *optarg)
void do_smbios_option(QemuOpts *opts)
{
#ifdef TARGET_I386
if (smbios_entry_add(optarg) < 0) {
exit(1);
}
smbios_entry_add(opts);
#endif
}
@ -1195,15 +1188,14 @@ static void mig_sleep_cpu(void *opq)
much time in the VM. The migration thread will try to catchup.
Workload will experience a performance drop.
*/
static void mig_throttle_cpu_down(CPUState *cpu, void *data)
{
async_run_on_cpu(cpu, mig_sleep_cpu, NULL);
}
static void mig_throttle_guest_down(void)
{
CPUState *cpu;
qemu_mutex_lock_iothread();
qemu_for_each_cpu(mig_throttle_cpu_down, NULL);
CPU_FOREACH(cpu) {
async_run_on_cpu(cpu, mig_sleep_cpu, NULL);
}
qemu_mutex_unlock_iothread();
}
@ -1217,11 +1209,11 @@ static void check_guest_throttling(void)
}
if (!t0) {
t0 = qemu_get_clock_ns(rt_clock);
t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
return;
}
t1 = qemu_get_clock_ns(rt_clock);
t1 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
/* If it has been more than 40 ms since the last time the guest
* was throttled then do it again.

24
async.c
View File

@ -150,7 +150,10 @@ aio_ctx_prepare(GSource *source, gint *timeout)
{
AioContext *ctx = (AioContext *) source;
QEMUBH *bh;
int deadline;
/* We assume there is no timeout already supplied */
*timeout = -1;
for (bh = ctx->first_bh; bh; bh = bh->next) {
if (!bh->deleted && bh->scheduled) {
if (bh->idle) {
@ -166,6 +169,14 @@ aio_ctx_prepare(GSource *source, gint *timeout)
}
}
deadline = qemu_timeout_ns_to_ms(timerlistgroup_deadline_ns(&ctx->tlg));
if (deadline == 0) {
*timeout = 0;
return true;
} else {
*timeout = qemu_soonest_timeout(*timeout, deadline);
}
return false;
}
@ -180,7 +191,7 @@ aio_ctx_check(GSource *source)
return true;
}
}
return aio_pending(ctx);
return aio_pending(ctx) || (timerlistgroup_deadline_ns(&ctx->tlg) == 0);
}
static gboolean
@ -201,10 +212,11 @@ aio_ctx_finalize(GSource *source)
AioContext *ctx = (AioContext *) source;
thread_pool_free(ctx->thread_pool);
aio_set_event_notifier(ctx, &ctx->notifier, NULL, NULL);
aio_set_event_notifier(ctx, &ctx->notifier, NULL);
event_notifier_cleanup(&ctx->notifier);
qemu_mutex_destroy(&ctx->bh_lock);
g_array_free(ctx->pollfds, TRUE);
timerlistgroup_deinit(&ctx->tlg);
}
static GSourceFuncs aio_source_funcs = {
@ -233,6 +245,11 @@ void aio_notify(AioContext *ctx)
event_notifier_set(&ctx->notifier);
}
static void aio_timerlist_notify(void *opaque)
{
aio_notify(opaque);
}
AioContext *aio_context_new(void)
{
AioContext *ctx;
@ -243,7 +260,8 @@ AioContext *aio_context_new(void)
event_notifier_init(&ctx->notifier, false);
aio_set_event_notifier(ctx, &ctx->notifier,
(EventNotifierHandler *)
event_notifier_test_and_clear, NULL);
event_notifier_test_and_clear);
timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
return ctx;
}

View File

@ -1124,10 +1124,11 @@ static int audio_is_timer_needed (void)
static void audio_reset_timer (AudioState *s)
{
if (audio_is_timer_needed ()) {
qemu_mod_timer (s->ts, qemu_get_clock_ns (vm_clock) + 1);
timer_mod (s->ts,
qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + conf.period.ticks);
}
else {
qemu_del_timer (s->ts);
timer_del (s->ts);
}
}
@ -1834,7 +1835,7 @@ static void audio_init (void)
QLIST_INIT (&s->cap_head);
atexit (audio_atexit);
s->ts = qemu_new_timer_ns (vm_clock, audio_timer, s);
s->ts = timer_new_ns(QEMU_CLOCK_VIRTUAL, audio_timer, s);
if (!s->ts) {
hw_error("Could not create audio timer\n");
}

View File

@ -348,7 +348,6 @@ void mixeng_clear (struct st_sample *buf, int len)
void mixeng_volume (struct st_sample *buf, int len, struct mixeng_volume *vol)
{
#ifdef CONFIG_MIXEMU
if (vol->mute) {
mixeng_clear (buf, len);
return;
@ -364,9 +363,4 @@ void mixeng_volume (struct st_sample *buf, int len, struct mixeng_volume *vol)
#endif
buf += 1;
}
#else
(void) buf;
(void) len;
(void) vol;
#endif
}

View File

@ -35,7 +35,7 @@
#define IN_T glue (glue (ITYPE, BSIZE), _t)
#ifdef FLOAT_MIXENG
static mixeng_real inline glue (conv_, ET) (IN_T v)
static inline mixeng_real glue (conv_, ET) (IN_T v)
{
IN_T nv = ENDIAN_CONVERT (v);
@ -54,7 +54,7 @@ static mixeng_real inline glue (conv_, ET) (IN_T v)
#endif
}
static IN_T inline glue (clip_, ET) (mixeng_real v)
static inline IN_T glue (clip_, ET) (mixeng_real v)
{
if (v >= 0.5) {
return IN_MAX;

View File

@ -46,7 +46,7 @@ static int no_run_out (HWVoiceOut *hw, int live)
int64_t ticks;
int64_t bytes;
now = qemu_get_clock_ns (vm_clock);
now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
ticks = now - no->old_ticks;
bytes = muldiv64 (ticks, hw->info.bytes_per_second, get_ticks_per_sec ());
bytes = audio_MIN (bytes, INT_MAX);
@ -102,7 +102,7 @@ static int no_run_in (HWVoiceIn *hw)
int samples = 0;
if (dead) {
int64_t now = qemu_get_clock_ns (vm_clock);
int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
int64_t ticks = now - no->old_ticks;
int64_t bytes =
muldiv64 (ticks, hw->info.bytes_per_second, get_ticks_per_sec ());

View File

@ -849,6 +849,10 @@ static int oss_ctl_in (HWVoiceIn *hw, int cmd, ...)
static void *oss_audio_init (void)
{
if (access(conf.devpath_in, R_OK | W_OK) < 0 ||
access(conf.devpath_out, R_OK | W_OK) < 0) {
return NULL;
}
return &conf;
}

View File

@ -81,7 +81,7 @@ static void spice_audio_fini (void *opaque)
static void rate_start (SpiceRateCtl *rate)
{
memset (rate, 0, sizeof (*rate));
rate->start_ticks = qemu_get_clock_ns (vm_clock);
rate->start_ticks = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
}
static int rate_get_samples (struct audio_pcm_info *info, SpiceRateCtl *rate)
@ -91,7 +91,7 @@ static int rate_get_samples (struct audio_pcm_info *info, SpiceRateCtl *rate)
int64_t bytes;
int64_t samples;
now = qemu_get_clock_ns (vm_clock);
now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
ticks = now - rate->start_ticks;
bytes = muldiv64 (ticks, info->bytes_per_second, get_ticks_per_sec ());
samples = (bytes - rate->bytes_sent) >> info->shift;

View File

@ -52,7 +52,7 @@ static int wav_run_out (HWVoiceOut *hw, int live)
int rpos, decr, samples;
uint8_t *dst;
struct st_sample *src;
int64_t now = qemu_get_clock_ns (vm_clock);
int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
int64_t ticks = now - wav->old_ticks;
int64_t bytes =
muldiv64 (ticks, hw->info.bytes_per_second, get_ticks_per_sec ());

View File

@ -314,9 +314,9 @@ static int baum_eat_packet(BaumDriverState *baum, const uint8_t *buf, int len)
return 0; \
if (*cur++ != ESC) { \
DPRINTF("Broken packet %#2x, tossing\n", req); \
if (qemu_timer_pending(baum->cellCount_timer)) { \
qemu_del_timer(baum->cellCount_timer); \
baum_cellCount_timer_cb(baum); \
if (timer_pending(baum->cellCount_timer)) { \
timer_del(baum->cellCount_timer); \
baum_cellCount_timer_cb(baum); \
} \
return (cur - 2 - buf); \
} \
@ -334,7 +334,7 @@ static int baum_eat_packet(BaumDriverState *baum, const uint8_t *buf, int len)
int i;
/* Allow 100ms to complete the DisplayData packet */
qemu_mod_timer(baum->cellCount_timer, qemu_get_clock_ns(vm_clock) +
timer_mod(baum->cellCount_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
get_ticks_per_sec() / 10);
for (i = 0; i < baum->x * baum->y ; i++) {
EAT(c);
@ -348,7 +348,7 @@ static int baum_eat_packet(BaumDriverState *baum, const uint8_t *buf, int len)
c = '?';
text[i] = c;
}
qemu_del_timer(baum->cellCount_timer);
timer_del(baum->cellCount_timer);
memset(zero, 0, sizeof(zero));
@ -553,7 +553,7 @@ static void baum_close(struct CharDriverState *chr)
{
BaumDriverState *baum = chr->opaque;
qemu_free_timer(baum->cellCount_timer);
timer_free(baum->cellCount_timer);
if (baum->brlapi) {
brlapi__closeConnection(baum->brlapi);
g_free(baum->brlapi);
@ -588,7 +588,7 @@ CharDriverState *chr_baum_init(void)
goto fail_handle;
}
baum->cellCount_timer = qemu_new_timer_ns(vm_clock, baum_cellCount_timer_cb, baum);
baum->cellCount_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, baum_cellCount_timer_cb, baum);
if (brlapi__getDisplaySize(handle, &baum->x, &baum->y) == -1) {
brlapi_perror("baum_init: brlapi_getDisplaySize");
@ -614,7 +614,7 @@ CharDriverState *chr_baum_init(void)
return chr;
fail:
qemu_free_timer(baum->cellCount_timer);
timer_free(baum->cellCount_timer);
brlapi__closeConnection(handle);
fail_handle:
g_free(handle);

View File

@ -91,12 +91,14 @@ static int rng_egd_chr_can_read(void *opaque)
static void rng_egd_chr_read(void *opaque, const uint8_t *buf, int size)
{
RngEgd *s = RNG_EGD(opaque);
size_t buf_offset = 0;
while (size > 0 && s->requests) {
RngRequest *req = s->requests->data;
int len = MIN(size, req->size - req->offset);
memcpy(req->data + req->offset, buf, len);
memcpy(req->data + req->offset, buf + buf_offset, len);
buf_offset += len;
req->offset += len;
size -= len;
@ -167,7 +169,6 @@ static void rng_egd_set_chardev(Object *obj, const char *value, Error **errp)
if (b->opened) {
error_set(errp, QERR_PERMISSION_DENIED);
} else {
g_free(s->chr_name);
s->chr_name = g_strdup(value);
}
}

View File

@ -336,8 +336,8 @@ static void init_blk_migration_it(void *opaque, BlockDriverState *bs)
bmds->completed_sectors = 0;
bmds->shared_base = block_mig_state.shared_base;
alloc_aio_bitmap(bmds);
drive_get_ref(drive_get_by_blockdev(bs));
bdrv_set_in_use(bs, 1);
bdrv_ref(bs);
block_mig_state.total_sector_sum += sectors;
@ -575,7 +575,7 @@ static void blk_mig_cleanup(void)
while ((bmds = QSIMPLEQ_FIRST(&block_mig_state.bmds_list)) != NULL) {
QSIMPLEQ_REMOVE_HEAD(&block_mig_state.bmds_list, entry);
bdrv_set_in_use(bmds->bs, 0);
drive_put_ref(drive_get_by_blockdev(bmds->bs));
bdrv_unref(bmds->bs);
g_free(bmds->aio_bitmap);
g_free(bmds);
}

927
block.c

File diff suppressed because it is too large Load Diff

View File

@ -1,8 +1,8 @@
block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
block-obj-y += raw_bsd.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
block-obj-y += qed-check.o
block-obj-y += vhdx.o
block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
block-obj-y += parallels.o blkdebug.o blkverify.o
block-obj-y += snapshot.o qapi.o
block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o

View File

@ -202,9 +202,9 @@ static void backup_iostatus_reset(BlockJob *job)
bdrv_iostatus_reset(s->target);
}
static const BlockJobType backup_job_type = {
static const BlockJobDriver backup_job_driver = {
.instance_size = sizeof(BackupBlockJob),
.job_type = "backup",
.job_type = BLOCK_JOB_TYPE_BACKUP,
.set_speed = backup_set_speed,
.iostatus_reset = backup_iostatus_reset,
};
@ -272,9 +272,9 @@ static void coroutine_fn backup_run(void *opaque)
uint64_t delay_ns = ratelimit_calculate_delay(
&job->limit, job->sectors_read);
job->sectors_read = 0;
block_job_sleep_ns(&job->common, rt_clock, delay_ns);
block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, delay_ns);
} else {
block_job_sleep_ns(&job->common, rt_clock, 0);
block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, 0);
}
if (block_job_is_cancelled(&job->common)) {
@ -289,14 +289,14 @@ static void coroutine_fn backup_run(void *opaque)
* backing file. */
for (i = 0; i < BACKUP_SECTORS_PER_CLUSTER;) {
/* bdrv_co_is_allocated() only returns true/false based
* on the first set of sectors it comes accross that
/* bdrv_is_allocated() only returns true/false based
* on the first set of sectors it comes across that
* are are all in the same state.
* For that reason we must verify each sector in the
* backup cluster length. We end up copying more than
* needed but at some point that is always the case. */
alloced =
bdrv_co_is_allocated(bs,
bdrv_is_allocated(bs,
start * BACKUP_SECTORS_PER_CLUSTER + i,
BACKUP_SECTORS_PER_CLUSTER - i, &n);
i += n;
@ -338,7 +338,7 @@ static void coroutine_fn backup_run(void *opaque)
hbitmap_free(job->bitmap);
bdrv_iostatus_disable(target);
bdrv_delete(target);
bdrv_unref(target);
block_job_completed(&job->common, ret);
}
@ -370,7 +370,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
return;
}
BackupBlockJob *job = block_job_create(&backup_job_type, bs, speed,
BackupBlockJob *job = block_job_create(&backup_job_driver, bs, speed,
cb, opaque, errp);
if (!job) {
return;

View File

@ -168,6 +168,7 @@ static const char *event_names[BLKDBG_EVENT_MAX] = {
[BLKDBG_REFTABLE_LOAD] = "reftable_load",
[BLKDBG_REFTABLE_GROW] = "reftable_grow",
[BLKDBG_REFTABLE_UPDATE] = "reftable_update",
[BLKDBG_REFBLOCK_LOAD] = "refblock_load",
[BLKDBG_REFBLOCK_UPDATE] = "refblock_update",
@ -349,7 +350,8 @@ static QemuOptsList runtime_opts = {
},
};
static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags)
static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVBlkdebugState *s = bs->opaque;
QemuOpts *opts;
@ -360,8 +362,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags)
opts = qemu_opts_create_nofail(&runtime_opts);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (error_is_set(&local_err)) {
qerror_report_err(local_err);
error_free(local_err);
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
@ -371,6 +372,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags)
if (config) {
ret = read_config(s, config);
if (ret < 0) {
error_setg_errno(errp, -ret, "Could not read blkdebug config file");
goto fail;
}
}
@ -381,12 +383,14 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags)
/* Open the backing file */
filename = qemu_opt_get(opts, "x-image");
if (filename == NULL) {
error_setg(errp, "Could not retrieve image file name");
ret = -EINVAL;
goto fail;
}
ret = bdrv_file_open(&bs->file, filename, NULL, flags);
ret = bdrv_file_open(&bs->file, filename, NULL, flags, &local_err);
if (ret < 0) {
error_propagate(errp, local_err);
goto fail;
}

View File

@ -116,7 +116,8 @@ static QemuOptsList runtime_opts = {
},
};
static int blkverify_open(BlockDriverState *bs, QDict *options, int flags)
static int blkverify_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVBlkverifyState *s = bs->opaque;
QemuOpts *opts;
@ -127,8 +128,7 @@ static int blkverify_open(BlockDriverState *bs, QDict *options, int flags)
opts = qemu_opts_create_nofail(&runtime_opts);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (error_is_set(&local_err)) {
qerror_report_err(local_err);
error_free(local_err);
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
@ -136,26 +136,30 @@ static int blkverify_open(BlockDriverState *bs, QDict *options, int flags)
/* Parse the raw image filename */
raw = qemu_opt_get(opts, "x-raw");
if (raw == NULL) {
error_setg(errp, "Could not retrieve raw image filename");
ret = -EINVAL;
goto fail;
}
ret = bdrv_file_open(&bs->file, raw, NULL, flags);
ret = bdrv_file_open(&bs->file, raw, NULL, flags, &local_err);
if (ret < 0) {
error_propagate(errp, local_err);
goto fail;
}
/* Open the test file */
filename = qemu_opt_get(opts, "x-image");
if (filename == NULL) {
error_setg(errp, "Could not retrieve test image filename");
ret = -EINVAL;
goto fail;
}
s->test_file = bdrv_new("");
ret = bdrv_open(s->test_file, filename, NULL, flags, NULL);
ret = bdrv_open(s->test_file, filename, NULL, flags, NULL, &local_err);
if (ret < 0) {
bdrv_delete(s->test_file);
error_propagate(errp, local_err);
bdrv_unref(s->test_file);
s->test_file = NULL;
goto fail;
}
@ -169,7 +173,7 @@ static void blkverify_close(BlockDriverState *bs)
{
BDRVBlkverifyState *s = bs->opaque;
bdrv_delete(s->test_file);
bdrv_unref(s->test_file);
s->test_file = NULL;
}
@ -412,6 +416,8 @@ static BlockDriver bdrv_blkverify = {
.bdrv_aio_readv = blkverify_aio_readv,
.bdrv_aio_writev = blkverify_aio_writev,
.bdrv_aio_flush = blkverify_aio_flush,
.bdrv_check_ext_snapshot = bdrv_check_ext_snapshot_forbidden,
};
static void bdrv_blkverify_init(void)

View File

@ -108,7 +108,8 @@ static int bochs_probe(const uint8_t *buf, int buf_size, const char *filename)
return 0;
}
static int bochs_open(BlockDriverState *bs, QDict *options, int flags)
static int bochs_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVBochsState *s = bs->opaque;
int i;

View File

@ -53,7 +53,8 @@ static int cloop_probe(const uint8_t *buf, int buf_size, const char *filename)
return 0;
}
static int cloop_open(BlockDriverState *bs, QDict *options, int flags)
static int cloop_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVCloopState *s = bs->opaque;
uint32_t offsets_size, max_compressed_block_size = 1, i;

View File

@ -103,14 +103,14 @@ wait:
/* Note that even when no rate limit is applied we need to yield
* with no pending I/O here so that bdrv_drain_all() returns.
*/
block_job_sleep_ns(&s->common, rt_clock, delay_ns);
block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
if (block_job_is_cancelled(&s->common)) {
break;
}
/* Copy if allocated above the base */
ret = bdrv_co_is_allocated_above(top, base, sector_num,
COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
&n);
ret = bdrv_is_allocated_above(top, base, sector_num,
COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
&n);
copy = (ret == 1);
trace_commit_one_iteration(s, sector_num, n, ret);
if (copy) {
@ -173,9 +173,9 @@ static void commit_set_speed(BlockJob *job, int64_t speed, Error **errp)
ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
}
static const BlockJobType commit_job_type = {
static const BlockJobDriver commit_job_driver = {
.instance_size = sizeof(CommitBlockJob),
.job_type = "commit",
.job_type = BLOCK_JOB_TYPE_COMMIT,
.set_speed = commit_set_speed,
};
@ -238,7 +238,7 @@ void commit_start(BlockDriverState *bs, BlockDriverState *base,
}
s = block_job_create(&commit_job_type, bs, speed, cb, opaque, errp);
s = block_job_create(&commit_job_driver, bs, speed, cb, opaque, errp);
if (!s) {
return;
}

View File

@ -58,7 +58,8 @@ static int cow_probe(const uint8_t *buf, int buf_size, const char *filename)
return 0;
}
static int cow_open(BlockDriverState *bs, QDict *options, int flags)
static int cow_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVCowState *s = bs->opaque;
struct cow_header_v2 cow_header;
@ -106,7 +107,7 @@ static int cow_open(BlockDriverState *bs, QDict *options, int flags)
* XXX(hch): right now these functions are extremely inefficient.
* We should just read the whole bitmap we'll need in one go instead.
*/
static inline int cow_set_bit(BlockDriverState *bs, int64_t bitnum)
static inline int cow_set_bit(BlockDriverState *bs, int64_t bitnum, bool *first)
{
uint64_t offset = sizeof(struct cow_header_v2) + bitnum / 8;
uint8_t bitmap;
@ -117,27 +118,52 @@ static inline int cow_set_bit(BlockDriverState *bs, int64_t bitnum)
return ret;
}
if (bitmap & (1 << (bitnum % 8))) {
return 0;
}
if (*first) {
ret = bdrv_flush(bs->file);
if (ret < 0) {
return ret;
}
*first = false;
}
bitmap |= (1 << (bitnum % 8));
ret = bdrv_pwrite_sync(bs->file, offset, &bitmap, sizeof(bitmap));
ret = bdrv_pwrite(bs->file, offset, &bitmap, sizeof(bitmap));
if (ret < 0) {
return ret;
}
return 0;
}
static inline int is_bit_set(BlockDriverState *bs, int64_t bitnum)
#define BITS_PER_BITMAP_SECTOR (512 * 8)
/* Cannot use bitmap.c on big-endian machines. */
static int cow_test_bit(int64_t bitnum, const uint8_t *bitmap)
{
uint64_t offset = sizeof(struct cow_header_v2) + bitnum / 8;
uint8_t bitmap;
int ret;
return (bitmap[bitnum / 8] & (1 << (bitnum & 7))) != 0;
}
ret = bdrv_pread(bs->file, offset, &bitmap, sizeof(bitmap));
if (ret < 0) {
return ret;
static int cow_find_streak(const uint8_t *bitmap, int value, int start, int nb_sectors)
{
int streak_value = value ? 0xFF : 0;
int last = MIN(start + nb_sectors, BITS_PER_BITMAP_SECTOR);
int bitnum = start;
while (bitnum < last) {
if ((bitnum & 7) == 0 && bitmap[bitnum / 8] == streak_value) {
bitnum += 8;
continue;
}
if (cow_test_bit(bitnum, bitmap) == value) {
bitnum++;
continue;
}
break;
}
return !!(bitmap & (1 << (bitnum % 8)));
return MIN(bitnum, last) - start;
}
/* Return true if first block has been changed (ie. current version is
@ -146,34 +172,44 @@ static inline int is_bit_set(BlockDriverState *bs, int64_t bitnum)
static int coroutine_fn cow_co_is_allocated(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, int *num_same)
{
int64_t bitnum = sector_num + sizeof(struct cow_header_v2) * 8;
uint64_t offset = (bitnum / 8) & -BDRV_SECTOR_SIZE;
uint8_t bitmap[BDRV_SECTOR_SIZE];
int ret;
int changed;
if (nb_sectors == 0) {
*num_same = nb_sectors;
return 0;
}
changed = is_bit_set(bs, sector_num);
if (changed < 0) {
return 0; /* XXX: how to return I/O errors? */
}
for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
if (is_bit_set(bs, sector_num + *num_same) != changed)
break;
ret = bdrv_pread(bs->file, offset, &bitmap, sizeof(bitmap));
if (ret < 0) {
return ret;
}
bitnum &= BITS_PER_BITMAP_SECTOR - 1;
changed = cow_test_bit(bitnum, bitmap);
*num_same = cow_find_streak(bitmap, changed, bitnum, nb_sectors);
return changed;
}
static int64_t coroutine_fn cow_co_get_block_status(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, int *num_same)
{
BDRVCowState *s = bs->opaque;
int ret = cow_co_is_allocated(bs, sector_num, nb_sectors, num_same);
int64_t offset = s->cow_sectors_offset + (sector_num << BDRV_SECTOR_BITS);
if (ret < 0) {
return ret;
}
return (ret ? BDRV_BLOCK_DATA : 0) | offset | BDRV_BLOCK_OFFSET_VALID;
}
static int cow_update_bitmap(BlockDriverState *bs, int64_t sector_num,
int nb_sectors)
{
int error = 0;
int i;
bool first = true;
for (i = 0; i < nb_sectors; i++) {
error = cow_set_bit(bs, sector_num + i);
error = cow_set_bit(bs, sector_num + i, &first);
if (error) {
break;
}
@ -189,7 +225,11 @@ static int coroutine_fn cow_read(BlockDriverState *bs, int64_t sector_num,
int ret, n;
while (nb_sectors > 0) {
if (bdrv_co_is_allocated(bs, sector_num, nb_sectors, &n)) {
ret = cow_co_is_allocated(bs, sector_num, nb_sectors, &n);
if (ret < 0) {
return ret;
}
if (ret) {
ret = bdrv_pread(bs->file,
s->cow_sectors_offset + sector_num * 512,
buf, n * 512);
@ -255,12 +295,14 @@ static void cow_close(BlockDriverState *bs)
{
}
static int cow_create(const char *filename, QEMUOptionParameter *options)
static int cow_create(const char *filename, QEMUOptionParameter *options,
Error **errp)
{
struct cow_header_v2 cow_header;
struct stat st;
int64_t image_sectors = 0;
const char *image_filename = NULL;
Error *local_err = NULL;
int ret;
BlockDriverState *cow_bs;
@ -274,13 +316,17 @@ static int cow_create(const char *filename, QEMUOptionParameter *options)
options++;
}
ret = bdrv_create_file(filename, options);
ret = bdrv_create_file(filename, options, &local_err);
if (ret < 0) {
qerror_report_err(local_err);
error_free(local_err);
return ret;
}
ret = bdrv_file_open(&cow_bs, filename, NULL, BDRV_O_RDWR);
ret = bdrv_file_open(&cow_bs, filename, NULL, BDRV_O_RDWR, &local_err);
if (ret < 0) {
qerror_report_err(local_err);
error_free(local_err);
return ret;
}
@ -314,7 +360,7 @@ static int cow_create(const char *filename, QEMUOptionParameter *options)
}
exit:
bdrv_delete(cow_bs);
bdrv_unref(cow_bs);
return ret;
}
@ -344,7 +390,7 @@ static BlockDriver bdrv_cow = {
.bdrv_read = cow_co_read,
.bdrv_write = cow_co_write,
.bdrv_co_is_allocated = cow_co_is_allocated,
.bdrv_co_get_block_status = cow_co_get_block_status,
.create_options = cow_create_options,
};

View File

@ -86,7 +86,6 @@ typedef struct BDRVCURLState {
static void curl_clean_state(CURLState *s);
static void curl_multi_do(void *arg);
static int curl_aio_flush(void *opaque);
static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action,
void *s, void *sp)
@ -94,17 +93,16 @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action,
DPRINTF("CURL (AIO): Sock action %d on fd %d\n", action, fd);
switch (action) {
case CURL_POLL_IN:
qemu_aio_set_fd_handler(fd, curl_multi_do, NULL, curl_aio_flush, s);
qemu_aio_set_fd_handler(fd, curl_multi_do, NULL, s);
break;
case CURL_POLL_OUT:
qemu_aio_set_fd_handler(fd, NULL, curl_multi_do, curl_aio_flush, s);
qemu_aio_set_fd_handler(fd, NULL, curl_multi_do, s);
break;
case CURL_POLL_INOUT:
qemu_aio_set_fd_handler(fd, curl_multi_do, curl_multi_do,
curl_aio_flush, s);
qemu_aio_set_fd_handler(fd, curl_multi_do, curl_multi_do, s);
break;
case CURL_POLL_REMOVE:
qemu_aio_set_fd_handler(fd, NULL, NULL, NULL, NULL);
qemu_aio_set_fd_handler(fd, NULL, NULL, NULL);
break;
}
@ -397,7 +395,8 @@ static QemuOptsList runtime_opts = {
},
};
static int curl_open(BlockDriverState *bs, QDict *options, int flags)
static int curl_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVCURLState *s = bs->opaque;
CURLState *state = NULL;
@ -495,21 +494,6 @@ out_noclean:
return -EINVAL;
}
static int curl_aio_flush(void *opaque)
{
BDRVCURLState *s = opaque;
int i, j;
for (i=0; i < CURL_NUM_STATES; i++) {
for(j=0; j < CURL_NUM_ACB; j++) {
if (s->states[i].acb[j]) {
return 1;
}
}
}
return 0;
}
static void curl_aio_cancel(BlockDriverAIOCB *blockacb)
{
// Do we have to implement canceling? Seems to work without...
@ -589,12 +573,6 @@ static BlockDriverAIOCB *curl_aio_readv(BlockDriverState *bs,
acb->nb_sectors = nb_sectors;
acb->bh = qemu_bh_new(curl_readv_bh_cb, acb);
if (!acb->bh) {
DPRINTF("CURL: qemu_bh_new failed\n");
return NULL;
}
qemu_bh_schedule(acb->bh);
return &acb->common;
}

View File

@ -92,7 +92,8 @@ static int read_uint32(BlockDriverState *bs, int64_t offset, uint32_t *result)
return 0;
}
static int dmg_open(BlockDriverState *bs, QDict *options, int flags)
static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVDMGState *s = bs->opaque;
uint64_t info_begin,info_end,last_in_offset,last_out_offset;

View File

@ -32,7 +32,6 @@ typedef struct BDRVGlusterState {
struct glfs *glfs;
int fds[2];
struct glfs_fd *fd;
int qemu_aio_count;
int event_reader_pos;
GlusterAIOCB *event_acb;
} BDRVGlusterState;
@ -247,7 +246,6 @@ static void qemu_gluster_complete_aio(GlusterAIOCB *acb, BDRVGlusterState *s)
ret = -EIO; /* Partial read/write - fail it */
}
s->qemu_aio_count--;
qemu_aio_release(acb);
cb(opaque, ret);
if (finished) {
@ -275,13 +273,6 @@ static void qemu_gluster_aio_event_reader(void *opaque)
} while (ret < 0 && errno == EINTR);
}
static int qemu_gluster_aio_flush_cb(void *opaque)
{
BDRVGlusterState *s = opaque;
return (s->qemu_aio_count > 0);
}
/* TODO Convert to fine grained options */
static QemuOptsList runtime_opts = {
.name = "gluster",
@ -297,7 +288,7 @@ static QemuOptsList runtime_opts = {
};
static int qemu_gluster_open(BlockDriverState *bs, QDict *options,
int bdrv_flags)
int bdrv_flags, Error **errp)
{
BDRVGlusterState *s = bs->opaque;
int open_flags = O_BINARY;
@ -348,7 +339,7 @@ static int qemu_gluster_open(BlockDriverState *bs, QDict *options,
}
fcntl(s->fds[GLUSTER_FD_READ], F_SETFL, O_NONBLOCK);
qemu_aio_set_fd_handler(s->fds[GLUSTER_FD_READ],
qemu_gluster_aio_event_reader, NULL, qemu_gluster_aio_flush_cb, s);
qemu_gluster_aio_event_reader, NULL, s);
out:
qemu_opts_del(opts);
@ -366,7 +357,7 @@ out:
}
static int qemu_gluster_create(const char *filename,
QEMUOptionParameter *options)
QEMUOptionParameter *options, Error **errp)
{
struct glfs *glfs;
struct glfs_fd *fd;
@ -436,22 +427,9 @@ static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret, void *arg)
/*
* Gluster AIO callback thread failed to notify the waiting
* QEMU thread about IO completion.
*
* Complete this IO request and make the disk inaccessible for
* subsequent reads and writes.
*/
error_report("Gluster failed to notify QEMU about IO completion");
qemu_mutex_lock_iothread(); /* We are in gluster thread context */
acb->common.cb(acb->common.opaque, -EIO);
qemu_aio_release(acb);
s->qemu_aio_count--;
close(s->fds[GLUSTER_FD_READ]);
close(s->fds[GLUSTER_FD_WRITE]);
qemu_aio_set_fd_handler(s->fds[GLUSTER_FD_READ], NULL, NULL, NULL,
NULL);
bs->drv = NULL; /* Make the disk inaccessible */
qemu_mutex_unlock_iothread();
error_report("Gluster AIO completion failed: %s", strerror(errno));
abort();
}
}
@ -467,7 +445,6 @@ static BlockDriverAIOCB *qemu_gluster_aio_rw(BlockDriverState *bs,
offset = sector_num * BDRV_SECTOR_SIZE;
size = nb_sectors * BDRV_SECTOR_SIZE;
s->qemu_aio_count++;
acb = qemu_aio_get(&gluster_aiocb_info, bs, cb, opaque);
acb->size = size;
@ -488,7 +465,6 @@ static BlockDriverAIOCB *qemu_gluster_aio_rw(BlockDriverState *bs,
return &acb->common;
out:
s->qemu_aio_count--;
qemu_aio_release(acb);
return NULL;
}
@ -531,7 +507,6 @@ static BlockDriverAIOCB *qemu_gluster_aio_flush(BlockDriverState *bs,
acb->size = 0;
acb->ret = 0;
acb->finished = NULL;
s->qemu_aio_count++;
ret = glfs_fsync_async(s->fd, &gluster_finish_aiocb, acb);
if (ret < 0) {
@ -540,7 +515,6 @@ static BlockDriverAIOCB *qemu_gluster_aio_flush(BlockDriverState *bs,
return &acb->common;
out:
s->qemu_aio_count--;
qemu_aio_release(acb);
return NULL;
}
@ -563,7 +537,6 @@ static BlockDriverAIOCB *qemu_gluster_aio_discard(BlockDriverState *bs,
acb->size = 0;
acb->ret = 0;
acb->finished = NULL;
s->qemu_aio_count++;
ret = glfs_discard_async(s->fd, offset, size, &gluster_finish_aiocb, acb);
if (ret < 0) {
@ -572,7 +545,6 @@ static BlockDriverAIOCB *qemu_gluster_aio_discard(BlockDriverState *bs,
return &acb->common;
out:
s->qemu_aio_count--;
qemu_aio_release(acb);
return NULL;
}
@ -611,7 +583,7 @@ static void qemu_gluster_close(BlockDriverState *bs)
close(s->fds[GLUSTER_FD_READ]);
close(s->fds[GLUSTER_FD_WRITE]);
qemu_aio_set_fd_handler(s->fds[GLUSTER_FD_READ], NULL, NULL, NULL, NULL);
qemu_aio_set_fd_handler(s->fds[GLUSTER_FD_READ], NULL, NULL, NULL);
if (s->fd) {
glfs_close(s->fd);
@ -639,6 +611,7 @@ static BlockDriver bdrv_gluster = {
.format_name = "gluster",
.protocol_name = "gluster",
.instance_size = sizeof(BDRVGlusterState),
.bdrv_needs_filename = true,
.bdrv_file_open = qemu_gluster_open,
.bdrv_close = qemu_gluster_close,
.bdrv_create = qemu_gluster_create,
@ -659,6 +632,7 @@ static BlockDriver bdrv_gluster_tcp = {
.format_name = "gluster",
.protocol_name = "gluster+tcp",
.instance_size = sizeof(BDRVGlusterState),
.bdrv_needs_filename = true,
.bdrv_file_open = qemu_gluster_open,
.bdrv_close = qemu_gluster_close,
.bdrv_create = qemu_gluster_create,
@ -679,6 +653,7 @@ static BlockDriver bdrv_gluster_unix = {
.format_name = "gluster",
.protocol_name = "gluster+unix",
.instance_size = sizeof(BDRVGlusterState),
.bdrv_needs_filename = true,
.bdrv_file_open = qemu_gluster_open,
.bdrv_close = qemu_gluster_close,
.bdrv_create = qemu_gluster_create,
@ -699,6 +674,7 @@ static BlockDriver bdrv_gluster_rdma = {
.format_name = "gluster",
.protocol_name = "gluster+rdma",
.instance_size = sizeof(BDRVGlusterState),
.bdrv_needs_filename = true,
.bdrv_file_open = qemu_gluster_open,
.bdrv_close = qemu_gluster_close,
.bdrv_create = qemu_gluster_create,

View File

@ -33,6 +33,8 @@
#include "trace.h"
#include "block/scsi.h"
#include "qemu/iov.h"
#include "sysemu/sysemu.h"
#include "qmp-commands.h"
#include <iscsi/iscsi.h>
#include <iscsi/scsi-lowlevel.h>
@ -50,8 +52,21 @@ typedef struct IscsiLun {
uint64_t num_blocks;
int events;
QEMUTimer *nop_timer;
uint8_t lbpme;
uint8_t lbprz;
struct scsi_inquiry_logical_block_provisioning lbp;
struct scsi_inquiry_block_limits bl;
} IscsiLun;
typedef struct IscsiTask {
int status;
int complete;
int retries;
int do_retry;
struct scsi_task *task;
Coroutine *co;
} IscsiTask;
typedef struct IscsiAIOCB {
BlockDriverAIOCB common;
QEMUIOVector *qiov;
@ -72,6 +87,7 @@ typedef struct IscsiAIOCB {
#define NOP_INTERVAL 5000
#define MAX_NOP_FAILURES 3
#define ISCSI_CMD_RETRIES 5
#define ISCSI_MAX_UNMAP 131072
static void
iscsi_bh_cb(void *p)
@ -105,6 +121,41 @@ iscsi_schedule_bh(IscsiAIOCB *acb)
qemu_bh_schedule(acb->bh);
}
static void
iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
void *command_data, void *opaque)
{
struct IscsiTask *iTask = opaque;
struct scsi_task *task = command_data;
iTask->complete = 1;
iTask->status = status;
iTask->do_retry = 0;
iTask->task = task;
if (iTask->retries-- > 0 && status == SCSI_STATUS_CHECK_CONDITION
&& task->sense.key == SCSI_SENSE_UNIT_ATTENTION) {
iTask->do_retry = 1;
goto out;
}
if (status != SCSI_STATUS_GOOD) {
error_report("iSCSI: Failure. %s", iscsi_get_error(iscsi));
}
out:
if (iTask->co) {
qemu_coroutine_enter(iTask->co, NULL);
}
}
static void iscsi_co_init_iscsitask(IscsiLun *iscsilun, struct IscsiTask *iTask)
{
*iTask = (struct IscsiTask) {
.co = qemu_coroutine_self(),
.retries = ISCSI_CMD_RETRIES,
};
}
static void
iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void *command_data,
@ -146,13 +197,6 @@ static const AIOCBInfo iscsi_aiocb_info = {
static void iscsi_process_read(void *arg);
static void iscsi_process_write(void *arg);
static int iscsi_process_flush(void *arg)
{
IscsiLun *iscsilun = arg;
return iscsi_queue_length(iscsilun->iscsi) > 0;
}
static void
iscsi_set_events(IscsiLun *iscsilun)
{
@ -166,7 +210,6 @@ iscsi_set_events(IscsiLun *iscsilun)
qemu_aio_set_fd_handler(iscsi_get_fd(iscsi),
iscsi_process_read,
(ev & POLLOUT) ? iscsi_process_write : NULL,
iscsi_process_flush,
iscsilun);
}
@ -576,88 +619,6 @@ iscsi_aio_flush(BlockDriverState *bs,
return &acb->common;
}
static int iscsi_aio_discard_acb(IscsiAIOCB *acb);
static void
iscsi_unmap_cb(struct iscsi_context *iscsi, int status,
void *command_data, void *opaque)
{
IscsiAIOCB *acb = opaque;
if (acb->canceled != 0) {
return;
}
acb->status = 0;
if (status != 0) {
if (status == SCSI_STATUS_CHECK_CONDITION
&& acb->task->sense.key == SCSI_SENSE_UNIT_ATTENTION
&& acb->retries-- > 0) {
scsi_free_scsi_task(acb->task);
acb->task = NULL;
if (iscsi_aio_discard_acb(acb) == 0) {
iscsi_set_events(acb->iscsilun);
return;
}
}
error_report("Failed to unmap data on iSCSI lun. %s",
iscsi_get_error(iscsi));
acb->status = -EIO;
}
iscsi_schedule_bh(acb);
}
static int iscsi_aio_discard_acb(IscsiAIOCB *acb) {
struct iscsi_context *iscsi = acb->iscsilun->iscsi;
struct unmap_list list[1];
acb->canceled = 0;
acb->bh = NULL;
acb->status = -EINPROGRESS;
acb->buf = NULL;
list[0].lba = sector_qemu2lun(acb->sector_num, acb->iscsilun);
list[0].num = acb->nb_sectors * BDRV_SECTOR_SIZE / acb->iscsilun->block_size;
acb->task = iscsi_unmap_task(iscsi, acb->iscsilun->lun,
0, 0, &list[0], 1,
iscsi_unmap_cb,
acb);
if (acb->task == NULL) {
error_report("iSCSI: Failed to send unmap command. %s",
iscsi_get_error(iscsi));
return -1;
}
return 0;
}
static BlockDriverAIOCB *
iscsi_aio_discard(BlockDriverState *bs,
int64_t sector_num, int nb_sectors,
BlockDriverCompletionFunc *cb, void *opaque)
{
IscsiLun *iscsilun = bs->opaque;
IscsiAIOCB *acb;
acb = qemu_aio_get(&iscsi_aiocb_info, bs, cb, opaque);
acb->iscsilun = iscsilun;
acb->nb_sectors = nb_sectors;
acb->sector_num = sector_num;
acb->retries = ISCSI_CMD_RETRIES;
if (iscsi_aio_discard_acb(acb) != 0) {
qemu_aio_release(acb);
return NULL;
}
iscsi_set_events(iscsilun);
return &acb->common;
}
#ifdef __linux__
static void
iscsi_aio_ioctl_cb(struct iscsi_context *iscsi, int status,
@ -850,6 +811,171 @@ iscsi_getlength(BlockDriverState *bs)
return len;
}
#if defined(LIBISCSI_FEATURE_IOVECTOR)
static int64_t coroutine_fn iscsi_co_get_block_status(BlockDriverState *bs,
int64_t sector_num,
int nb_sectors, int *pnum)
{
IscsiLun *iscsilun = bs->opaque;
struct scsi_get_lba_status *lbas = NULL;
struct scsi_lba_status_descriptor *lbasd = NULL;
struct IscsiTask iTask;
int64_t ret;
iscsi_co_init_iscsitask(iscsilun, &iTask);
if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
ret = -EINVAL;
goto out;
}
/* default to all sectors allocated */
ret = BDRV_BLOCK_DATA;
ret |= (sector_num << BDRV_SECTOR_BITS) | BDRV_BLOCK_OFFSET_VALID;
*pnum = nb_sectors;
/* LUN does not support logical block provisioning */
if (iscsilun->lbpme == 0) {
goto out;
}
retry:
if (iscsi_get_lba_status_task(iscsilun->iscsi, iscsilun->lun,
sector_qemu2lun(sector_num, iscsilun),
8 + 16, iscsi_co_generic_cb,
&iTask) == NULL) {
ret = -EIO;
goto out;
}
while (!iTask.complete) {
iscsi_set_events(iscsilun);
qemu_coroutine_yield();
}
if (iTask.do_retry) {
if (iTask.task != NULL) {
scsi_free_scsi_task(iTask.task);
iTask.task = NULL;
}
goto retry;
}
if (iTask.status != SCSI_STATUS_GOOD) {
/* in case the get_lba_status_callout fails (i.e.
* because the device is busy or the cmd is not
* supported) we pretend all blocks are allocated
* for backwards compatibility */
goto out;
}
lbas = scsi_datain_unmarshall(iTask.task);
if (lbas == NULL) {
ret = -EIO;
goto out;
}
lbasd = &lbas->descriptors[0];
if (sector_qemu2lun(sector_num, iscsilun) != lbasd->lba) {
ret = -EIO;
goto out;
}
*pnum = sector_lun2qemu(lbasd->num_blocks, iscsilun);
if (*pnum > nb_sectors) {
*pnum = nb_sectors;
}
if (lbasd->provisioning == SCSI_PROVISIONING_TYPE_DEALLOCATED ||
lbasd->provisioning == SCSI_PROVISIONING_TYPE_ANCHORED) {
ret &= ~BDRV_BLOCK_DATA;
if (iscsilun->lbprz) {
ret |= BDRV_BLOCK_ZERO;
}
}
out:
if (iTask.task != NULL) {
scsi_free_scsi_task(iTask.task);
}
return ret;
}
#endif /* LIBISCSI_FEATURE_IOVECTOR */
static int
coroutine_fn iscsi_co_discard(BlockDriverState *bs, int64_t sector_num,
int nb_sectors)
{
IscsiLun *iscsilun = bs->opaque;
struct IscsiTask iTask;
struct unmap_list list;
uint32_t nb_blocks;
uint32_t max_unmap;
if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
return -EINVAL;
}
if (!iscsilun->lbp.lbpu) {
/* UNMAP is not supported by the target */
return 0;
}
list.lba = sector_qemu2lun(sector_num, iscsilun);
nb_blocks = sector_qemu2lun(nb_sectors, iscsilun);
max_unmap = iscsilun->bl.max_unmap;
if (max_unmap == 0xffffffff) {
max_unmap = ISCSI_MAX_UNMAP;
}
while (nb_blocks > 0) {
iscsi_co_init_iscsitask(iscsilun, &iTask);
list.num = nb_blocks;
if (list.num > max_unmap) {
list.num = max_unmap;
}
retry:
if (iscsi_unmap_task(iscsilun->iscsi, iscsilun->lun, 0, 0, &list, 1,
iscsi_co_generic_cb, &iTask) == NULL) {
return -EIO;
}
while (!iTask.complete) {
iscsi_set_events(iscsilun);
qemu_coroutine_yield();
}
if (iTask.task != NULL) {
scsi_free_scsi_task(iTask.task);
iTask.task = NULL;
}
if (iTask.do_retry) {
goto retry;
}
if (iTask.status == SCSI_STATUS_CHECK_CONDITION) {
/* the target might fail with a check condition if it
is not happy with the alignment of the UNMAP request
we silently fail in this case */
return 0;
}
if (iTask.status != SCSI_STATUS_GOOD) {
return -EIO;
}
list.lba += list.num;
nb_blocks -= list.num;
}
return 0;
}
static int parse_chap(struct iscsi_context *iscsi, const char *target)
{
QemuOptsList *list;
@ -930,8 +1056,9 @@ static char *parse_initiator_name(const char *target)
{
QemuOptsList *list;
QemuOpts *opts;
const char *name = NULL;
const char *iscsi_name = qemu_get_vm_name();
const char *name;
char *iscsi_name;
UuidInfo *uuid_info;
list = qemu_find_opts("iscsi");
if (list) {
@ -941,16 +1068,22 @@ static char *parse_initiator_name(const char *target)
}
if (opts) {
name = qemu_opt_get(opts, "initiator-name");
if (name) {
return g_strdup(name);
}
}
}
if (name) {
return g_strdup(name);
uuid_info = qmp_query_uuid(NULL);
if (strcmp(uuid_info->UUID, UUID_NONE) == 0) {
name = qemu_get_vm_name();
} else {
return g_strdup_printf("iqn.2008-11.org.linux-kvm%s%s",
iscsi_name ? ":" : "",
iscsi_name ? iscsi_name : "");
name = uuid_info->UUID;
}
iscsi_name = g_strdup_printf("iqn.2008-11.org.linux-kvm%s%s",
name ? ":" : "", name ? name : "");
qapi_free_UuidInfo(uuid_info);
return iscsi_name;
}
#if defined(LIBISCSI_FEATURE_NOP_COUNTER)
@ -968,7 +1101,7 @@ static void iscsi_nop_timed_event(void *opaque)
return;
}
qemu_mod_timer(iscsilun->nop_timer, qemu_get_clock_ms(rt_clock) + NOP_INTERVAL);
timer_mod(iscsilun->nop_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);
iscsi_set_events(iscsilun);
}
#endif
@ -998,6 +1131,8 @@ static int iscsi_readcapacity_sync(IscsiLun *iscsilun)
} else {
iscsilun->block_size = rc16->block_length;
iscsilun->num_blocks = rc16->returned_lba + 1;
iscsilun->lbpme = rc16->lbpme;
iscsilun->lbprz = rc16->lbprz;
}
}
break;
@ -1050,11 +1185,43 @@ static QemuOptsList runtime_opts = {
},
};
static struct scsi_task *iscsi_do_inquiry(struct iscsi_context *iscsi,
int lun, int evpd, int pc) {
int full_size;
struct scsi_task *task = NULL;
task = iscsi_inquiry_sync(iscsi, lun, evpd, pc, 64);
if (task == NULL || task->status != SCSI_STATUS_GOOD) {
goto fail;
}
full_size = scsi_datain_getfullsize(task);
if (full_size > task->datain.size) {
scsi_free_scsi_task(task);
/* we need more data for the full list */
task = iscsi_inquiry_sync(iscsi, lun, evpd, pc, full_size);
if (task == NULL || task->status != SCSI_STATUS_GOOD) {
goto fail;
}
}
return task;
fail:
error_report("iSCSI: Inquiry command failed : %s",
iscsi_get_error(iscsi));
if (task) {
scsi_free_scsi_task(task);
return NULL;
}
return NULL;
}
/*
* We support iscsi url's on the form
* iscsi://[<username>%<password>@]<host>[:<port>]/<targetname>/<lun>
*/
static int iscsi_open(BlockDriverState *bs, QDict *options, int flags)
static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
IscsiLun *iscsilun = bs->opaque;
struct iscsi_context *iscsi = NULL;
@ -1179,10 +1346,50 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags)
bs->sg = 1;
}
if (iscsilun->lbpme) {
struct scsi_inquiry_logical_block_provisioning *inq_lbp;
task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING);
if (task == NULL) {
ret = -EINVAL;
goto out;
}
inq_lbp = scsi_datain_unmarshall(task);
if (inq_lbp == NULL) {
error_report("iSCSI: failed to unmarshall inquiry datain blob");
ret = -EINVAL;
goto out;
}
memcpy(&iscsilun->lbp, inq_lbp,
sizeof(struct scsi_inquiry_logical_block_provisioning));
scsi_free_scsi_task(task);
task = NULL;
}
if (iscsilun->lbp.lbpu || iscsilun->lbp.lbpws) {
struct scsi_inquiry_block_limits *inq_bl;
task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
SCSI_INQUIRY_PAGECODE_BLOCK_LIMITS);
if (task == NULL) {
ret = -EINVAL;
goto out;
}
inq_bl = scsi_datain_unmarshall(task);
if (inq_bl == NULL) {
error_report("iSCSI: failed to unmarshall inquiry datain blob");
ret = -EINVAL;
goto out;
}
memcpy(&iscsilun->bl, inq_bl,
sizeof(struct scsi_inquiry_block_limits));
scsi_free_scsi_task(task);
task = NULL;
}
#if defined(LIBISCSI_FEATURE_NOP_COUNTER)
/* Set up a timer for sending out iSCSI NOPs */
iscsilun->nop_timer = qemu_new_timer_ms(rt_clock, iscsi_nop_timed_event, iscsilun);
qemu_mod_timer(iscsilun->nop_timer, qemu_get_clock_ms(rt_clock) + NOP_INTERVAL);
iscsilun->nop_timer = timer_new_ms(QEMU_CLOCK_REALTIME, iscsi_nop_timed_event, iscsilun);
timer_mod(iscsilun->nop_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);
#endif
out:
@ -1212,10 +1419,10 @@ static void iscsi_close(BlockDriverState *bs)
struct iscsi_context *iscsi = iscsilun->iscsi;
if (iscsilun->nop_timer) {
qemu_del_timer(iscsilun->nop_timer);
qemu_free_timer(iscsilun->nop_timer);
timer_del(iscsilun->nop_timer);
timer_free(iscsilun->nop_timer);
}
qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), NULL, NULL, NULL, NULL);
qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), NULL, NULL, NULL);
iscsi_destroy_context(iscsi);
memset(iscsilun, 0, sizeof(IscsiLun));
}
@ -1245,15 +1452,16 @@ static int iscsi_has_zero_init(BlockDriverState *bs)
return 0;
}
static int iscsi_create(const char *filename, QEMUOptionParameter *options)
static int iscsi_create(const char *filename, QEMUOptionParameter *options,
Error **errp)
{
int ret = 0;
int64_t total_size = 0;
BlockDriverState bs;
BlockDriverState *bs;
IscsiLun *iscsilun = NULL;
QDict *bs_options;
memset(&bs, 0, sizeof(BlockDriverState));
bs = bdrv_new("");
/* Read out options */
while (options && options->name) {
@ -1263,26 +1471,26 @@ static int iscsi_create(const char *filename, QEMUOptionParameter *options)
options++;
}
bs.opaque = g_malloc0(sizeof(struct IscsiLun));
iscsilun = bs.opaque;
bs->opaque = g_malloc0(sizeof(struct IscsiLun));
iscsilun = bs->opaque;
bs_options = qdict_new();
qdict_put(bs_options, "filename", qstring_from_str(filename));
ret = iscsi_open(&bs, bs_options, 0);
ret = iscsi_open(bs, bs_options, 0, NULL);
QDECREF(bs_options);
if (ret != 0) {
goto out;
}
if (iscsilun->nop_timer) {
qemu_del_timer(iscsilun->nop_timer);
qemu_free_timer(iscsilun->nop_timer);
timer_del(iscsilun->nop_timer);
timer_free(iscsilun->nop_timer);
}
if (iscsilun->type != TYPE_DISK) {
ret = -ENODEV;
goto out;
}
if (bs.total_sectors < total_size) {
if (bs->total_sectors < total_size) {
ret = -ENOSPC;
goto out;
}
@ -1292,7 +1500,9 @@ out:
if (iscsilun->iscsi != NULL) {
iscsi_destroy_context(iscsilun->iscsi);
}
g_free(bs.opaque);
g_free(bs->opaque);
bs->opaque = NULL;
bdrv_unref(bs);
return ret;
}
@ -1310,6 +1520,7 @@ static BlockDriver bdrv_iscsi = {
.protocol_name = "iscsi",
.instance_size = sizeof(IscsiLun),
.bdrv_needs_filename = true,
.bdrv_file_open = iscsi_open,
.bdrv_close = iscsi_close,
.bdrv_create = iscsi_create,
@ -1318,11 +1529,15 @@ static BlockDriver bdrv_iscsi = {
.bdrv_getlength = iscsi_getlength,
.bdrv_truncate = iscsi_truncate,
#if defined(LIBISCSI_FEATURE_IOVECTOR)
.bdrv_co_get_block_status = iscsi_co_get_block_status,
#endif
.bdrv_co_discard = iscsi_co_discard,
.bdrv_aio_readv = iscsi_aio_readv,
.bdrv_aio_writev = iscsi_aio_writev,
.bdrv_aio_flush = iscsi_aio_flush,
.bdrv_aio_discard = iscsi_aio_discard,
.bdrv_has_zero_init = iscsi_has_zero_init,
#ifdef __linux__

View File

@ -39,7 +39,6 @@ struct qemu_laiocb {
struct qemu_laio_state {
io_context_t ctx;
EventNotifier e;
int count;
};
static inline ssize_t io_event_ret(struct io_event *ev)
@ -55,8 +54,6 @@ static void qemu_laio_process_completion(struct qemu_laio_state *s,
{
int ret;
s->count--;
ret = laiocb->ret;
if (ret != -ECANCELED) {
if (ret == laiocb->nbytes) {
@ -101,13 +98,6 @@ static void qemu_laio_completion_cb(EventNotifier *e)
}
}
static int qemu_laio_flush_cb(EventNotifier *e)
{
struct qemu_laio_state *s = container_of(e, struct qemu_laio_state, e);
return (s->count > 0) ? 1 : 0;
}
static void laio_cancel(BlockDriverAIOCB *blockacb)
{
struct qemu_laiocb *laiocb = (struct qemu_laiocb *)blockacb;
@ -177,14 +167,11 @@ BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
goto out_free_aiocb;
}
io_set_eventfd(&laiocb->iocb, event_notifier_get_fd(&s->e));
s->count++;
if (io_submit(s->ctx, 1, &iocbs) < 0)
goto out_dec_count;
goto out_free_aiocb;
return &laiocb->common;
out_dec_count:
s->count--;
out_free_aiocb:
qemu_aio_release(laiocb);
return NULL;
@ -203,8 +190,7 @@ void *laio_init(void)
goto out_close_efd;
}
qemu_aio_set_event_notifier(&s->e, qemu_laio_completion_cb,
qemu_laio_flush_cb);
qemu_aio_set_event_notifier(&s->e, qemu_laio_completion_cb);
return s;

View File

@ -338,8 +338,8 @@ static void coroutine_fn mirror_run(void *opaque)
base = s->mode == MIRROR_SYNC_MODE_FULL ? NULL : bs->backing_hd;
for (sector_num = 0; sector_num < end; ) {
int64_t next = (sector_num | (sectors_per_chunk - 1)) + 1;
ret = bdrv_co_is_allocated_above(bs, base,
sector_num, next - sector_num, &n);
ret = bdrv_is_allocated_above(bs, base,
sector_num, next - sector_num, &n);
if (ret < 0) {
goto immediate_exit;
@ -356,7 +356,7 @@ static void coroutine_fn mirror_run(void *opaque)
}
bdrv_dirty_iter_init(bs, &s->hbi);
last_pause_ns = qemu_get_clock_ns(rt_clock);
last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
for (;;) {
uint64_t delay_ns;
int64_t cnt;
@ -374,7 +374,7 @@ static void coroutine_fn mirror_run(void *opaque)
* We do so every SLICE_TIME nanoseconds, or when there is an error,
* or when the source is clean, whichever comes first.
*/
if (qemu_get_clock_ns(rt_clock) - last_pause_ns < SLICE_TIME &&
if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - last_pause_ns < SLICE_TIME &&
s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
if (s->in_flight == MAX_IN_FLIGHT || s->buf_free_count == 0 ||
(cnt == 0 && s->in_flight > 0)) {
@ -439,13 +439,13 @@ static void coroutine_fn mirror_run(void *opaque)
delay_ns = 0;
}
block_job_sleep_ns(&s->common, rt_clock, delay_ns);
block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
if (block_job_is_cancelled(&s->common)) {
break;
}
} else if (!should_complete) {
delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0);
block_job_sleep_ns(&s->common, rt_clock, delay_ns);
block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
} else if (cnt == 0) {
/* The two disks are in sync. Exit and report successful
* completion.
@ -454,7 +454,7 @@ static void coroutine_fn mirror_run(void *opaque)
s->common.cancelled = false;
break;
}
last_pause_ns = qemu_get_clock_ns(rt_clock);
last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
}
immediate_exit:
@ -480,7 +480,7 @@ immediate_exit:
bdrv_swap(s->target, s->common.bs);
}
bdrv_close(s->target);
bdrv_delete(s->target);
bdrv_unref(s->target);
block_job_completed(&s->common, ret);
}
@ -505,14 +505,15 @@ static void mirror_iostatus_reset(BlockJob *job)
static void mirror_complete(BlockJob *job, Error **errp)
{
MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
Error *local_err = NULL;
int ret;
ret = bdrv_open_backing_file(s->target, NULL);
ret = bdrv_open_backing_file(s->target, NULL, &local_err);
if (ret < 0) {
char backing_filename[PATH_MAX];
bdrv_get_full_backing_filename(s->target, backing_filename,
sizeof(backing_filename));
error_setg_file_open(errp, -ret, backing_filename);
error_propagate(errp, local_err);
return;
}
if (!s->synced) {
@ -524,9 +525,9 @@ static void mirror_complete(BlockJob *job, Error **errp)
block_job_resume(job);
}
static const BlockJobType mirror_job_type = {
static const BlockJobDriver mirror_job_driver = {
.instance_size = sizeof(MirrorBlockJob),
.job_type = "mirror",
.job_type = BLOCK_JOB_TYPE_MIRROR,
.set_speed = mirror_set_speed,
.iostatus_reset= mirror_iostatus_reset,
.complete = mirror_complete,
@ -562,7 +563,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
return;
}
s = block_job_create(&mirror_job_type, bs, speed, cb, opaque, errp);
s = block_job_create(&mirror_job_driver, bs, speed, cb, opaque, errp);
if (!s) {
return;
}

View File

@ -279,13 +279,6 @@ static void nbd_coroutine_start(BDRVNBDState *s, struct nbd_request *request)
request->handle = INDEX_TO_HANDLE(s, i);
}
static int nbd_have_request(void *opaque)
{
BDRVNBDState *s = opaque;
return s->in_flight > 0;
}
static void nbd_reply_ready(void *opaque)
{
BDRVNBDState *s = opaque;
@ -341,8 +334,7 @@ static int nbd_co_send_request(BDRVNBDState *s, struct nbd_request *request,
qemu_co_mutex_lock(&s->send_mutex);
s->send_coroutine = qemu_coroutine_self();
qemu_aio_set_fd_handler(s->sock, nbd_reply_ready, nbd_restart_write,
nbd_have_request, s);
qemu_aio_set_fd_handler(s->sock, nbd_reply_ready, nbd_restart_write, s);
if (qiov) {
if (!s->is_unix) {
socket_set_cork(s->sock, 1);
@ -361,8 +353,7 @@ static int nbd_co_send_request(BDRVNBDState *s, struct nbd_request *request,
} else {
rc = nbd_send_request(s->sock, request);
}
qemu_aio_set_fd_handler(s->sock, nbd_reply_ready, NULL,
nbd_have_request, s);
qemu_aio_set_fd_handler(s->sock, nbd_reply_ready, NULL, s);
s->send_coroutine = NULL;
qemu_co_mutex_unlock(&s->send_mutex);
return rc;
@ -438,8 +429,7 @@ static int nbd_establish_connection(BlockDriverState *bs)
/* Now that we're connected, set the socket to be non-blocking and
* kick the reply mechanism. */
qemu_set_nonblock(sock);
qemu_aio_set_fd_handler(sock, nbd_reply_ready, NULL,
nbd_have_request, s);
qemu_aio_set_fd_handler(sock, nbd_reply_ready, NULL, s);
s->sock = sock;
s->size = size;
@ -459,11 +449,12 @@ static void nbd_teardown_connection(BlockDriverState *bs)
request.len = 0;
nbd_send_request(s->sock, &request);
qemu_aio_set_fd_handler(s->sock, NULL, NULL, NULL, NULL);
qemu_aio_set_fd_handler(s->sock, NULL, NULL, NULL);
closesocket(s->sock);
}
static int nbd_open(BlockDriverState *bs, QDict *options, int flags)
static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVNBDState *s = bs->opaque;
int result;

View File

@ -68,7 +68,8 @@ static int parallels_probe(const uint8_t *buf, int buf_size, const char *filenam
return 0;
}
static int parallels_open(BlockDriverState *bs, QDict *options, int flags)
static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVParallelsState *s = bs->opaque;
int i;

View File

@ -25,6 +25,9 @@
#include "block/qapi.h"
#include "block/block_int.h"
#include "qmp-commands.h"
#include "qapi-visit.h"
#include "qapi/qmp-output-visitor.h"
#include "qapi/qmp/types.h"
/*
* Returns 0 on success, with *p_list either set to describe snapshot
@ -134,6 +137,9 @@ void bdrv_query_image_info(BlockDriverState *bs,
info->dirty_flag = bdi.is_dirty;
info->has_dirty_flag = true;
}
info->format_specific = bdrv_get_specific_info(bs);
info->has_format_specific = info->format_specific != NULL;
backing_filename = bs->backing_file;
if (backing_filename[0] != '\0') {
info->backing_filename = g_strdup(backing_filename);
@ -223,18 +229,44 @@ void bdrv_query_info(BlockDriverState *bs,
info->inserted->backing_file_depth = bdrv_get_backing_file_depth(bs);
if (bs->io_limits_enabled) {
info->inserted->bps =
bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
info->inserted->bps_rd =
bs->io_limits.bps[BLOCK_IO_LIMIT_READ];
info->inserted->bps_wr =
bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE];
info->inserted->iops =
bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
info->inserted->iops_rd =
bs->io_limits.iops[BLOCK_IO_LIMIT_READ];
info->inserted->iops_wr =
bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE];
ThrottleConfig cfg;
throttle_get_config(&bs->throttle_state, &cfg);
info->inserted->bps = cfg.buckets[THROTTLE_BPS_TOTAL].avg;
info->inserted->bps_rd = cfg.buckets[THROTTLE_BPS_READ].avg;
info->inserted->bps_wr = cfg.buckets[THROTTLE_BPS_WRITE].avg;
info->inserted->iops = cfg.buckets[THROTTLE_OPS_TOTAL].avg;
info->inserted->iops_rd = cfg.buckets[THROTTLE_OPS_READ].avg;
info->inserted->iops_wr = cfg.buckets[THROTTLE_OPS_WRITE].avg;
info->inserted->has_bps_max =
cfg.buckets[THROTTLE_BPS_TOTAL].max;
info->inserted->bps_max =
cfg.buckets[THROTTLE_BPS_TOTAL].max;
info->inserted->has_bps_rd_max =
cfg.buckets[THROTTLE_BPS_READ].max;
info->inserted->bps_rd_max =
cfg.buckets[THROTTLE_BPS_READ].max;
info->inserted->has_bps_wr_max =
cfg.buckets[THROTTLE_BPS_WRITE].max;
info->inserted->bps_wr_max =
cfg.buckets[THROTTLE_BPS_WRITE].max;
info->inserted->has_iops_max =
cfg.buckets[THROTTLE_OPS_TOTAL].max;
info->inserted->iops_max =
cfg.buckets[THROTTLE_OPS_TOTAL].max;
info->inserted->has_iops_rd_max =
cfg.buckets[THROTTLE_OPS_READ].max;
info->inserted->iops_rd_max =
cfg.buckets[THROTTLE_OPS_READ].max;
info->inserted->has_iops_wr_max =
cfg.buckets[THROTTLE_OPS_WRITE].max;
info->inserted->iops_wr_max =
cfg.buckets[THROTTLE_OPS_WRITE].max;
info->inserted->has_iops_size = cfg.op_size;
info->inserted->iops_size = cfg.op_size;
}
bs0 = bs;
@ -397,6 +429,119 @@ void bdrv_snapshot_dump(fprintf_function func_fprintf, void *f,
}
}
static void dump_qdict(fprintf_function func_fprintf, void *f, int indentation,
QDict *dict);
static void dump_qlist(fprintf_function func_fprintf, void *f, int indentation,
QList *list);
static void dump_qobject(fprintf_function func_fprintf, void *f,
int comp_indent, QObject *obj)
{
switch (qobject_type(obj)) {
case QTYPE_QINT: {
QInt *value = qobject_to_qint(obj);
func_fprintf(f, "%" PRId64, qint_get_int(value));
break;
}
case QTYPE_QSTRING: {
QString *value = qobject_to_qstring(obj);
func_fprintf(f, "%s", qstring_get_str(value));
break;
}
case QTYPE_QDICT: {
QDict *value = qobject_to_qdict(obj);
dump_qdict(func_fprintf, f, comp_indent, value);
break;
}
case QTYPE_QLIST: {
QList *value = qobject_to_qlist(obj);
dump_qlist(func_fprintf, f, comp_indent, value);
break;
}
case QTYPE_QFLOAT: {
QFloat *value = qobject_to_qfloat(obj);
func_fprintf(f, "%g", qfloat_get_double(value));
break;
}
case QTYPE_QBOOL: {
QBool *value = qobject_to_qbool(obj);
func_fprintf(f, "%s", qbool_get_int(value) ? "true" : "false");
break;
}
case QTYPE_QERROR: {
QString *value = qerror_human((QError *)obj);
func_fprintf(f, "%s", qstring_get_str(value));
break;
}
case QTYPE_NONE:
break;
case QTYPE_MAX:
default:
abort();
}
}
static void dump_qlist(fprintf_function func_fprintf, void *f, int indentation,
QList *list)
{
const QListEntry *entry;
int i = 0;
for (entry = qlist_first(list); entry; entry = qlist_next(entry), i++) {
qtype_code type = qobject_type(entry->value);
bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);
const char *format = composite ? "%*s[%i]:\n" : "%*s[%i]: ";
func_fprintf(f, format, indentation * 4, "", i);
dump_qobject(func_fprintf, f, indentation + 1, entry->value);
if (!composite) {
func_fprintf(f, "\n");
}
}
}
static void dump_qdict(fprintf_function func_fprintf, void *f, int indentation,
QDict *dict)
{
const QDictEntry *entry;
for (entry = qdict_first(dict); entry; entry = qdict_next(dict, entry)) {
qtype_code type = qobject_type(entry->value);
bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);
const char *format = composite ? "%*s%s:\n" : "%*s%s: ";
char key[strlen(entry->key) + 1];
int i;
/* replace dashes with spaces in key (variable) names */
for (i = 0; entry->key[i]; i++) {
key[i] = entry->key[i] == '-' ? ' ' : entry->key[i];
}
key[i] = 0;
func_fprintf(f, format, indentation * 4, "", key);
dump_qobject(func_fprintf, f, indentation + 1, entry->value);
if (!composite) {
func_fprintf(f, "\n");
}
}
}
void bdrv_image_info_specific_dump(fprintf_function func_fprintf, void *f,
ImageInfoSpecific *info_spec)
{
Error *local_err = NULL;
QmpOutputVisitor *ov = qmp_output_visitor_new();
QObject *obj, *data;
visit_type_ImageInfoSpecific(qmp_output_get_visitor(ov), &info_spec, NULL,
&local_err);
obj = qmp_output_get_qobject(ov);
assert(qobject_type(obj) == QTYPE_QDICT);
data = qdict_get(qobject_to_qdict(obj), "data");
dump_qobject(func_fprintf, f, 1, data);
qmp_output_visitor_cleanup(ov);
}
void bdrv_image_info_dump(fprintf_function func_fprintf, void *f,
ImageInfo *info)
{
@ -467,4 +612,9 @@ void bdrv_image_info_dump(fprintf_function func_fprintf, void *f,
func_fprintf(f, "\n");
}
}
if (info->has_format_specific) {
func_fprintf(f, "Format specific information:\n");
bdrv_image_info_specific_dump(func_fprintf, f, info->format_specific);
}
}

View File

@ -92,7 +92,8 @@ static int qcow_probe(const uint8_t *buf, int buf_size, const char *filename)
return 0;
}
static int qcow_open(BlockDriverState *bs, QDict *options, int flags)
static int qcow_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVQcowState *s = bs->opaque;
int len, i, shift, ret;
@ -395,7 +396,7 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
return cluster_offset;
}
static int coroutine_fn qcow_co_is_allocated(BlockDriverState *bs,
static int64_t coroutine_fn qcow_co_get_block_status(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, int *pnum)
{
BDRVQcowState *s = bs->opaque;
@ -410,7 +411,14 @@ static int coroutine_fn qcow_co_is_allocated(BlockDriverState *bs,
if (n > nb_sectors)
n = nb_sectors;
*pnum = n;
return (cluster_offset != 0);
if (!cluster_offset) {
return 0;
}
if ((cluster_offset & QCOW_OFLAG_COMPRESSED) || s->crypt_method) {
return BDRV_BLOCK_DATA;
}
cluster_offset |= (index_in_cluster << BDRV_SECTOR_BITS);
return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | cluster_offset;
}
static int decompress_buffer(uint8_t *out_buf, int out_buf_size,
@ -651,7 +659,8 @@ static void qcow_close(BlockDriverState *bs)
error_free(s->migration_blocker);
}
static int qcow_create(const char *filename, QEMUOptionParameter *options)
static int qcow_create(const char *filename, QEMUOptionParameter *options,
Error **errp)
{
int header_size, backing_filename_len, l1_size, shift, i;
QCowHeader header;
@ -659,6 +668,7 @@ static int qcow_create(const char *filename, QEMUOptionParameter *options)
int64_t total_size = 0;
const char *backing_file = NULL;
int flags = 0;
Error *local_err = NULL;
int ret;
BlockDriverState *qcow_bs;
@ -674,13 +684,17 @@ static int qcow_create(const char *filename, QEMUOptionParameter *options)
options++;
}
ret = bdrv_create_file(filename, options);
ret = bdrv_create_file(filename, options, &local_err);
if (ret < 0) {
qerror_report_err(local_err);
error_free(local_err);
return ret;
}
ret = bdrv_file_open(&qcow_bs, filename, NULL, BDRV_O_RDWR);
ret = bdrv_file_open(&qcow_bs, filename, NULL, BDRV_O_RDWR, &local_err);
if (ret < 0) {
qerror_report_err(local_err);
error_free(local_err);
return ret;
}
@ -751,7 +765,7 @@ static int qcow_create(const char *filename, QEMUOptionParameter *options)
g_free(tmp);
ret = 0;
exit:
bdrv_delete(qcow_bs);
bdrv_unref(qcow_bs);
return ret;
}
@ -896,7 +910,7 @@ static BlockDriver bdrv_qcow = {
.bdrv_co_readv = qcow_co_readv,
.bdrv_co_writev = qcow_co_writev,
.bdrv_co_is_allocated = qcow_co_is_allocated,
.bdrv_co_get_block_status = qcow_co_get_block_status,
.bdrv_set_key = qcow_set_key,
.bdrv_make_empty = qcow_make_empty,

View File

@ -114,6 +114,21 @@ static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
return ret;
}
if (c == s->refcount_block_cache) {
ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_REFCOUNT_BLOCK,
c->entries[i].offset, s->cluster_size);
} else if (c == s->l2_table_cache) {
ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_ACTIVE_L2,
c->entries[i].offset, s->cluster_size);
} else {
ret = qcow2_pre_write_overlap_check(bs, 0,
c->entries[i].offset, s->cluster_size);
}
if (ret < 0) {
return ret;
}
if (c == s->refcount_block_cache) {
BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE_PART);
} else if (c == s->l2_table_cache) {
@ -185,6 +200,24 @@ void qcow2_cache_depends_on_flush(Qcow2Cache *c)
c->depends_on_flush = true;
}
int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c)
{
int ret, i;
ret = qcow2_cache_flush(bs, c);
if (ret < 0) {
return ret;
}
for (i = 0; i < c->size; i++) {
assert(c->entries[i].ref == 0);
c->entries[i].offset = 0;
c->entries[i].cache_hits = 0;
}
return 0;
}
static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
{
int i;

View File

@ -35,6 +35,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
BDRVQcowState *s = bs->opaque;
int new_l1_size2, ret, i;
uint64_t *new_l1_table;
int64_t old_l1_table_offset, old_l1_size;
int64_t new_l1_table_offset, new_l1_size;
uint8_t data[12];
@ -80,6 +81,14 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
goto fail;
}
/* the L1 position has not yet been updated, so these clusters must
* indeed be completely free */
ret = qcow2_pre_write_overlap_check(bs, 0, new_l1_table_offset,
new_l1_size2);
if (ret < 0) {
goto fail;
}
BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_WRITE_TABLE);
for(i = 0; i < s->l1_size; i++)
new_l1_table[i] = cpu_to_be64(new_l1_table[i]);
@ -92,17 +101,19 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
/* set new table */
BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_ACTIVATE_TABLE);
cpu_to_be32w((uint32_t*)data, new_l1_size);
cpu_to_be64wu((uint64_t*)(data + 4), new_l1_table_offset);
stq_be_p(data + 4, new_l1_table_offset);
ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, l1_size), data,sizeof(data));
if (ret < 0) {
goto fail;
}
g_free(s->l1_table);
qcow2_free_clusters(bs, s->l1_table_offset, s->l1_size * sizeof(uint64_t),
QCOW2_DISCARD_OTHER);
old_l1_table_offset = s->l1_table_offset;
s->l1_table_offset = new_l1_table_offset;
s->l1_table = new_l1_table;
old_l1_size = s->l1_size;
s->l1_size = new_l1_size;
qcow2_free_clusters(bs, old_l1_table_offset, old_l1_size * sizeof(uint64_t),
QCOW2_DISCARD_OTHER);
return 0;
fail:
g_free(new_l1_table);
@ -137,7 +148,7 @@ static int l2_load(BlockDriverState *bs, uint64_t l2_offset,
* and we really don't want bdrv_pread to perform a read-modify-write)
*/
#define L1_ENTRIES_PER_SECTOR (512 / 8)
static int write_l1_entry(BlockDriverState *bs, int l1_index)
int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index)
{
BDRVQcowState *s = bs->opaque;
uint64_t buf[L1_ENTRIES_PER_SECTOR];
@ -149,6 +160,12 @@ static int write_l1_entry(BlockDriverState *bs, int l1_index)
buf[i] = cpu_to_be64(s->l1_table[l1_start_index + i]);
}
ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_ACTIVE_L1,
s->l1_table_offset + 8 * l1_start_index, sizeof(buf));
if (ret < 0) {
return ret;
}
BLKDBG_EVENT(bs->file, BLKDBG_L1_UPDATE);
ret = bdrv_pwrite_sync(bs->file, s->l1_table_offset + 8 * l1_start_index,
buf, sizeof(buf));
@ -173,7 +190,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
{
BDRVQcowState *s = bs->opaque;
uint64_t old_l2_offset;
uint64_t *l2_table;
uint64_t *l2_table = NULL;
int64_t l2_offset;
int ret;
@ -185,7 +202,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
l2_offset = qcow2_alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
if (l2_offset < 0) {
return l2_offset;
ret = l2_offset;
goto fail;
}
ret = qcow2_cache_flush(bs, s->refcount_block_cache);
@ -198,7 +216,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
trace_qcow2_l2_allocate_get_empty(bs, l1_index);
ret = qcow2_cache_get_empty(bs, s->l2_table_cache, l2_offset, (void**) table);
if (ret < 0) {
return ret;
goto fail;
}
l2_table = *table;
@ -239,7 +257,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
/* update the L1 entry */
trace_qcow2_l2_allocate_write_l1(bs, l1_index);
s->l1_table[l1_index] = l2_offset | QCOW_OFLAG_COPIED;
ret = write_l1_entry(bs, l1_index);
ret = qcow2_write_l1_entry(bs, l1_index);
if (ret < 0) {
goto fail;
}
@ -250,8 +268,14 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
fail:
trace_qcow2_l2_allocate_done(bs, l1_index, ret);
qcow2_cache_put(bs, s->l2_table_cache, (void**) table);
if (l2_table != NULL) {
qcow2_cache_put(bs, s->l2_table_cache, (void**) table);
}
s->l1_table[l1_index] = old_l2_offset;
if (l2_offset > 0) {
qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
QCOW2_DISCARD_ALWAYS);
}
return ret;
}
@ -263,23 +287,26 @@ fail:
* cluster which may require a different handling)
*/
static int count_contiguous_clusters(uint64_t nb_clusters, int cluster_size,
uint64_t *l2_table, uint64_t start, uint64_t stop_flags)
uint64_t *l2_table, uint64_t stop_flags)
{
int i;
uint64_t mask = stop_flags | L2E_OFFSET_MASK;
uint64_t offset = be64_to_cpu(l2_table[0]) & mask;
uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
uint64_t first_entry = be64_to_cpu(l2_table[0]);
uint64_t offset = first_entry & mask;
if (!offset)
return 0;
for (i = start; i < start + nb_clusters; i++) {
assert(qcow2_get_cluster_type(first_entry) != QCOW2_CLUSTER_COMPRESSED);
for (i = 0; i < nb_clusters; i++) {
uint64_t l2_entry = be64_to_cpu(l2_table[i]) & mask;
if (offset + (uint64_t) i * cluster_size != l2_entry) {
break;
}
}
return (i - start);
return i;
}
static int count_contiguous_free_clusters(uint64_t nb_clusters, uint64_t *l2_table)
@ -368,6 +395,12 @@ static int coroutine_fn copy_sectors(BlockDriverState *bs,
&s->aes_encrypt_key);
}
ret = qcow2_pre_write_overlap_check(bs, 0,
cluster_offset + n_start * BDRV_SECTOR_SIZE, n * BDRV_SECTOR_SIZE);
if (ret < 0) {
goto out;
}
BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
ret = bdrv_co_writev(bs->file, (cluster_offset >> 9) + n_start, n, &qiov);
if (ret < 0) {
@ -466,8 +499,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
return -EIO;
}
c = count_contiguous_clusters(nb_clusters, s->cluster_size,
&l2_table[l2_index], 0,
QCOW_OFLAG_COMPRESSED | QCOW_OFLAG_ZERO);
&l2_table[l2_index], QCOW_OFLAG_ZERO);
*cluster_offset = 0;
break;
case QCOW2_CLUSTER_UNALLOCATED:
@ -478,8 +510,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
case QCOW2_CLUSTER_NORMAL:
/* how many allocated clusters ? */
c = count_contiguous_clusters(nb_clusters, s->cluster_size,
&l2_table[l2_index], 0,
QCOW_OFLAG_COMPRESSED | QCOW_OFLAG_ZERO);
&l2_table[l2_index], QCOW_OFLAG_ZERO);
*cluster_offset &= L2E_OFFSET_MASK;
break;
default:
@ -695,6 +726,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
}
qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
assert(l2_index + m->nb_clusters <= s->l2_size);
for (i = 0; i < m->nb_clusters; i++) {
/* if two concurrent writes happen to the same unallocated cluster
* each write allocates separate cluster and writes data concurrently.
@ -908,7 +940,7 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
/* We keep all QCOW_OFLAG_COPIED clusters */
keep_clusters =
count_contiguous_clusters(nb_clusters, s->cluster_size,
&l2_table[l2_index], 0,
&l2_table[l2_index],
QCOW_OFLAG_COPIED | QCOW_OFLAG_ZERO);
assert(keep_clusters <= nb_clusters);
@ -1317,7 +1349,7 @@ int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset)
* clusters.
*/
static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
unsigned int nb_clusters)
unsigned int nb_clusters, enum qcow2_discard_type type)
{
BDRVQcowState *s = bs->opaque;
uint64_t *l2_table;
@ -1346,7 +1378,7 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
l2_table[l2_index + i] = cpu_to_be64(0);
/* Then decrease the refcount */
qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
qcow2_free_any_clusters(bs, old_offset, 1, type);
}
ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
@ -1358,7 +1390,7 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
}
int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
int nb_sectors)
int nb_sectors, enum qcow2_discard_type type)
{
BDRVQcowState *s = bs->opaque;
uint64_t end_offset;
@ -1381,7 +1413,7 @@ int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
/* Each L2 table is handled by its own loop iteration */
while (nb_clusters > 0) {
ret = discard_single_l2(bs, offset, nb_clusters);
ret = discard_single_l2(bs, offset, nb_clusters, type);
if (ret < 0) {
goto fail;
}
@ -1476,3 +1508,255 @@ fail:
return ret;
}
/*
* Expands all zero clusters in a specific L1 table (or deallocates them, for
* non-backed non-pre-allocated zero clusters).
*
* expanded_clusters is a bitmap where every bit corresponds to one cluster in
* the image file; a bit gets set if the corresponding cluster has been used for
* zero expansion (i.e., has been filled with zeroes and is referenced from an
* L2 table). nb_clusters contains the total cluster count of the image file,
* i.e., the number of bits in expanded_clusters.
*/
static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
int l1_size, uint8_t **expanded_clusters,
uint64_t *nb_clusters)
{
BDRVQcowState *s = bs->opaque;
bool is_active_l1 = (l1_table == s->l1_table);
uint64_t *l2_table = NULL;
int ret;
int i, j;
if (!is_active_l1) {
/* inactive L2 tables require a buffer to be stored in when loading
* them from disk */
l2_table = qemu_blockalign(bs, s->cluster_size);
}
for (i = 0; i < l1_size; i++) {
uint64_t l2_offset = l1_table[i] & L1E_OFFSET_MASK;
bool l2_dirty = false;
if (!l2_offset) {
/* unallocated */
continue;
}
if (is_active_l1) {
/* get active L2 tables from cache */
ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset,
(void **)&l2_table);
} else {
/* load inactive L2 tables from disk */
ret = bdrv_read(bs->file, l2_offset / BDRV_SECTOR_SIZE,
(void *)l2_table, s->cluster_sectors);
}
if (ret < 0) {
goto fail;
}
for (j = 0; j < s->l2_size; j++) {
uint64_t l2_entry = be64_to_cpu(l2_table[j]);
int64_t offset = l2_entry & L2E_OFFSET_MASK, cluster_index;
int cluster_type = qcow2_get_cluster_type(l2_entry);
bool preallocated = offset != 0;
if (cluster_type == QCOW2_CLUSTER_NORMAL) {
cluster_index = offset >> s->cluster_bits;
assert((cluster_index >= 0) && (cluster_index < *nb_clusters));
if ((*expanded_clusters)[cluster_index / 8] &
(1 << (cluster_index % 8))) {
/* Probably a shared L2 table; this cluster was a zero
* cluster which has been expanded, its refcount
* therefore most likely requires an update. */
ret = qcow2_update_cluster_refcount(bs, cluster_index, 1,
QCOW2_DISCARD_NEVER);
if (ret < 0) {
goto fail;
}
/* Since we just increased the refcount, the COPIED flag may
* no longer be set. */
l2_table[j] = cpu_to_be64(l2_entry & ~QCOW_OFLAG_COPIED);
l2_dirty = true;
}
continue;
}
else if (qcow2_get_cluster_type(l2_entry) != QCOW2_CLUSTER_ZERO) {
continue;
}
if (!preallocated) {
if (!bs->backing_hd) {
/* not backed; therefore we can simply deallocate the
* cluster */
l2_table[j] = 0;
l2_dirty = true;
continue;
}
offset = qcow2_alloc_clusters(bs, s->cluster_size);
if (offset < 0) {
ret = offset;
goto fail;
}
}
ret = qcow2_pre_write_overlap_check(bs, 0, offset, s->cluster_size);
if (ret < 0) {
if (!preallocated) {
qcow2_free_clusters(bs, offset, s->cluster_size,
QCOW2_DISCARD_ALWAYS);
}
goto fail;
}
ret = bdrv_write_zeroes(bs->file, offset / BDRV_SECTOR_SIZE,
s->cluster_sectors);
if (ret < 0) {
if (!preallocated) {
qcow2_free_clusters(bs, offset, s->cluster_size,
QCOW2_DISCARD_ALWAYS);
}
goto fail;
}
l2_table[j] = cpu_to_be64(offset | QCOW_OFLAG_COPIED);
l2_dirty = true;
cluster_index = offset >> s->cluster_bits;
if (cluster_index >= *nb_clusters) {
uint64_t old_bitmap_size = (*nb_clusters + 7) / 8;
uint64_t new_bitmap_size;
/* The offset may lie beyond the old end of the underlying image
* file for growable files only */
assert(bs->file->growable);
*nb_clusters = size_to_clusters(s, bs->file->total_sectors *
BDRV_SECTOR_SIZE);
new_bitmap_size = (*nb_clusters + 7) / 8;
*expanded_clusters = g_realloc(*expanded_clusters,
new_bitmap_size);
/* clear the newly allocated space */
memset(&(*expanded_clusters)[old_bitmap_size], 0,
new_bitmap_size - old_bitmap_size);
}
assert((cluster_index >= 0) && (cluster_index < *nb_clusters));
(*expanded_clusters)[cluster_index / 8] |= 1 << (cluster_index % 8);
}
if (is_active_l1) {
if (l2_dirty) {
qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
qcow2_cache_depends_on_flush(s->l2_table_cache);
}
ret = qcow2_cache_put(bs, s->l2_table_cache, (void **)&l2_table);
if (ret < 0) {
l2_table = NULL;
goto fail;
}
} else {
if (l2_dirty) {
ret = qcow2_pre_write_overlap_check(bs,
QCOW2_OL_INACTIVE_L2 | QCOW2_OL_ACTIVE_L2, l2_offset,
s->cluster_size);
if (ret < 0) {
goto fail;
}
ret = bdrv_write(bs->file, l2_offset / BDRV_SECTOR_SIZE,
(void *)l2_table, s->cluster_sectors);
if (ret < 0) {
goto fail;
}
}
}
}
ret = 0;
fail:
if (l2_table) {
if (!is_active_l1) {
qemu_vfree(l2_table);
} else {
if (ret < 0) {
qcow2_cache_put(bs, s->l2_table_cache, (void **)&l2_table);
} else {
ret = qcow2_cache_put(bs, s->l2_table_cache,
(void **)&l2_table);
}
}
}
return ret;
}
/*
* For backed images, expands all zero clusters on the image. For non-backed
* images, deallocates all non-pre-allocated zero clusters (and claims the
* allocation for pre-allocated ones). This is important for downgrading to a
* qcow2 version which doesn't yet support metadata zero clusters.
*/
int qcow2_expand_zero_clusters(BlockDriverState *bs)
{
BDRVQcowState *s = bs->opaque;
uint64_t *l1_table = NULL;
uint64_t nb_clusters;
uint8_t *expanded_clusters;
int ret;
int i, j;
nb_clusters = size_to_clusters(s, bs->file->total_sectors *
BDRV_SECTOR_SIZE);
expanded_clusters = g_malloc0((nb_clusters + 7) / 8);
ret = expand_zero_clusters_in_l1(bs, s->l1_table, s->l1_size,
&expanded_clusters, &nb_clusters);
if (ret < 0) {
goto fail;
}
/* Inactive L1 tables may point to active L2 tables - therefore it is
* necessary to flush the L2 table cache before trying to access the L2
* tables pointed to by inactive L1 entries (else we might try to expand
* zero clusters that have already been expanded); furthermore, it is also
* necessary to empty the L2 table cache, since it may contain tables which
* are now going to be modified directly on disk, bypassing the cache.
* qcow2_cache_empty() does both for us. */
ret = qcow2_cache_empty(bs, s->l2_table_cache);
if (ret < 0) {
goto fail;
}
for (i = 0; i < s->nb_snapshots; i++) {
int l1_sectors = (s->snapshots[i].l1_size * sizeof(uint64_t) +
BDRV_SECTOR_SIZE - 1) / BDRV_SECTOR_SIZE;
l1_table = g_realloc(l1_table, l1_sectors * BDRV_SECTOR_SIZE);
ret = bdrv_read(bs->file, s->snapshots[i].l1_table_offset /
BDRV_SECTOR_SIZE, (void *)l1_table, l1_sectors);
if (ret < 0) {
goto fail;
}
for (j = 0; j < s->snapshots[i].l1_size; j++) {
be64_to_cpus(&l1_table[j]);
}
ret = expand_zero_clusters_in_l1(bs, l1_table, s->snapshots[i].l1_size,
&expanded_clusters, &nb_clusters);
if (ret < 0) {
goto fail;
}
}
ret = 0;
fail:
g_free(expanded_clusters);
g_free(l1_table);
return ret;
}

View File

@ -25,6 +25,8 @@
#include "qemu-common.h"
#include "block/block_int.h"
#include "block/qcow2.h"
#include "qemu/range.h"
#include "qapi/qmp/types.h"
static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size);
static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
@ -599,10 +601,10 @@ fail:
* If the return value is non-negative, it is the new refcount of the cluster.
* If it is negative, it is -errno and indicates an error.
*/
static int update_cluster_refcount(BlockDriverState *bs,
int64_t cluster_index,
int addend,
enum qcow2_discard_type type)
int qcow2_update_cluster_refcount(BlockDriverState *bs,
int64_t cluster_index,
int addend,
enum qcow2_discard_type type)
{
BDRVQcowState *s = bs->opaque;
int ret;
@ -731,8 +733,8 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
if (free_in_cluster == 0)
s->free_byte_offset = 0;
if ((offset & (s->cluster_size - 1)) != 0)
update_cluster_refcount(bs, offset >> s->cluster_bits, 1,
QCOW2_DISCARD_NEVER);
qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits, 1,
QCOW2_DISCARD_NEVER);
} else {
offset = qcow2_alloc_clusters(bs, s->cluster_size);
if (offset < 0) {
@ -742,8 +744,8 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
if ((cluster_offset + s->cluster_size) == offset) {
/* we are lucky: contiguous data */
offset = s->free_byte_offset;
update_cluster_refcount(bs, offset >> s->cluster_bits, 1,
QCOW2_DISCARD_NEVER);
qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits, 1,
QCOW2_DISCARD_NEVER);
s->free_byte_offset += size;
} else {
s->free_byte_offset = offset;
@ -752,8 +754,8 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
}
/* The cluster refcount was incremented, either by qcow2_alloc_clusters()
* or explicitly by update_cluster_refcount(). Refcount blocks must be
* flushed before the caller's L2 table updates.
* or explicitly by qcow2_update_cluster_refcount(). Refcount blocks must
* be flushed before the caller's L2 table updates.
*/
qcow2_cache_set_dependency(bs, s->l2_table_cache, s->refcount_block_cache);
return offset;
@ -794,11 +796,13 @@ void qcow2_free_any_clusters(BlockDriverState *bs, uint64_t l2_entry,
}
break;
case QCOW2_CLUSTER_NORMAL:
qcow2_free_clusters(bs, l2_entry & L2E_OFFSET_MASK,
nb_clusters << s->cluster_bits, type);
case QCOW2_CLUSTER_ZERO:
if (l2_entry & L2E_OFFSET_MASK) {
qcow2_free_clusters(bs, l2_entry & L2E_OFFSET_MASK,
nb_clusters << s->cluster_bits, type);
}
break;
case QCOW2_CLUSTER_UNALLOCATED:
case QCOW2_CLUSTER_ZERO:
break;
default:
abort();
@ -861,15 +865,17 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
}
for(j = 0; j < s->l2_size; j++) {
uint64_t cluster_index;
offset = be64_to_cpu(l2_table[j]);
if (offset != 0) {
old_offset = offset;
offset &= ~QCOW_OFLAG_COPIED;
if (offset & QCOW_OFLAG_COMPRESSED) {
old_offset = offset;
offset &= ~QCOW_OFLAG_COPIED;
switch (qcow2_get_cluster_type(offset)) {
case QCOW2_CLUSTER_COMPRESSED:
nb_csectors = ((offset >> s->csize_shift) &
s->csize_mask) + 1;
if (addend != 0) {
int ret;
ret = update_refcount(bs,
(offset & s->cluster_offset_mask) & ~511,
nb_csectors * 512, addend,
@ -880,11 +886,20 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
}
/* compressed clusters are never modified */
refcount = 2;
} else {
uint64_t cluster_index = (offset & L2E_OFFSET_MASK) >> s->cluster_bits;
break;
case QCOW2_CLUSTER_NORMAL:
case QCOW2_CLUSTER_ZERO:
cluster_index = (offset & L2E_OFFSET_MASK) >> s->cluster_bits;
if (!cluster_index) {
/* unallocated */
refcount = 0;
break;
}
if (addend != 0) {
refcount = update_cluster_refcount(bs, cluster_index, addend,
QCOW2_DISCARD_SNAPSHOT);
refcount = qcow2_update_cluster_refcount(bs,
cluster_index, addend,
QCOW2_DISCARD_SNAPSHOT);
} else {
refcount = get_refcount(bs, cluster_index);
}
@ -893,19 +908,26 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
ret = refcount;
goto fail;
}
}
break;
if (refcount == 1) {
offset |= QCOW_OFLAG_COPIED;
}
if (offset != old_offset) {
if (addend > 0) {
qcow2_cache_set_dependency(bs, s->l2_table_cache,
s->refcount_block_cache);
}
l2_table[j] = cpu_to_be64(offset);
qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
case QCOW2_CLUSTER_UNALLOCATED:
refcount = 0;
break;
default:
abort();
}
if (refcount == 1) {
offset |= QCOW_OFLAG_COPIED;
}
if (offset != old_offset) {
if (addend > 0) {
qcow2_cache_set_dependency(bs, s->l2_table_cache,
s->refcount_block_cache);
}
l2_table[j] = cpu_to_be64(offset);
qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
}
}
@ -916,8 +938,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
if (addend != 0) {
refcount = update_cluster_refcount(bs, l2_offset >> s->cluster_bits, addend,
QCOW2_DISCARD_SNAPSHOT);
refcount = qcow2_update_cluster_refcount(bs, l2_offset >>
s->cluster_bits, addend, QCOW2_DISCARD_SNAPSHOT);
} else {
refcount = get_refcount(bs, l2_offset >> s->cluster_bits);
}
@ -1014,7 +1036,6 @@ static void inc_refcounts(BlockDriverState *bs,
/* Flags for check_refcounts_l1() and check_refcounts_l2() */
enum {
CHECK_OFLAG_COPIED = 0x1, /* check QCOW_OFLAG_COPIED matches refcount */
CHECK_FRAG_INFO = 0x2, /* update BlockFragInfo counters */
};
@ -1033,7 +1054,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
BDRVQcowState *s = bs->opaque;
uint64_t *l2_table, l2_entry;
uint64_t next_contiguous_offset = 0;
int i, l2_size, nb_csectors, refcount;
int i, l2_size, nb_csectors;
/* Read L2 table from disk */
l2_size = s->l2_size * sizeof(uint64_t);
@ -1085,23 +1106,8 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
case QCOW2_CLUSTER_NORMAL:
{
/* QCOW_OFLAG_COPIED must be set iff refcount == 1 */
uint64_t offset = l2_entry & L2E_OFFSET_MASK;
if (flags & CHECK_OFLAG_COPIED) {
refcount = get_refcount(bs, offset >> s->cluster_bits);
if (refcount < 0) {
fprintf(stderr, "Can't get refcount for offset %"
PRIx64 ": %s\n", l2_entry, strerror(-refcount));
goto fail;
}
if ((refcount == 1) != ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
fprintf(stderr, "ERROR OFLAG_COPIED: offset=%"
PRIx64 " refcount=%d\n", l2_entry, refcount);
res->corruptions++;
}
}
if (flags & CHECK_FRAG_INFO) {
res->bfi.allocated_clusters++;
if (next_contiguous_offset &&
@ -1158,7 +1164,7 @@ static int check_refcounts_l1(BlockDriverState *bs,
{
BDRVQcowState *s = bs->opaque;
uint64_t *l1_table, l2_offset, l1_size2;
int i, refcount, ret;
int i, ret;
l1_size2 = l1_size * sizeof(uint64_t);
@ -1182,22 +1188,6 @@ static int check_refcounts_l1(BlockDriverState *bs,
for(i = 0; i < l1_size; i++) {
l2_offset = l1_table[i];
if (l2_offset) {
/* QCOW_OFLAG_COPIED must be set iff refcount == 1 */
if (flags & CHECK_OFLAG_COPIED) {
refcount = get_refcount(bs, (l2_offset & ~QCOW_OFLAG_COPIED)
>> s->cluster_bits);
if (refcount < 0) {
fprintf(stderr, "Can't get refcount for l2_offset %"
PRIx64 ": %s\n", l2_offset, strerror(-refcount));
goto fail;
}
if ((refcount == 1) != ((l2_offset & QCOW_OFLAG_COPIED) != 0)) {
fprintf(stderr, "ERROR OFLAG_COPIED: l2_offset=%" PRIx64
" refcount=%d\n", l2_offset, refcount);
res->corruptions++;
}
}
/* Mark L2 table as used */
l2_offset &= L1E_OFFSET_MASK;
inc_refcounts(bs, res, refcount_table, refcount_table_size,
@ -1228,6 +1218,238 @@ fail:
return -EIO;
}
/*
* Checks the OFLAG_COPIED flag for all L1 and L2 entries.
*
* This function does not print an error message nor does it increment
* check_errors if get_refcount fails (this is because such an error will have
* been already detected and sufficiently signaled by the calling function
* (qcow2_check_refcounts) by the time this function is called).
*/
static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
BdrvCheckMode fix)
{
BDRVQcowState *s = bs->opaque;
uint64_t *l2_table = qemu_blockalign(bs, s->cluster_size);
int ret;
int refcount;
int i, j;
for (i = 0; i < s->l1_size; i++) {
uint64_t l1_entry = s->l1_table[i];
uint64_t l2_offset = l1_entry & L1E_OFFSET_MASK;
bool l2_dirty = false;
if (!l2_offset) {
continue;
}
refcount = get_refcount(bs, l2_offset >> s->cluster_bits);
if (refcount < 0) {
/* don't print message nor increment check_errors */
continue;
}
if ((refcount == 1) != ((l1_entry & QCOW_OFLAG_COPIED) != 0)) {
fprintf(stderr, "%s OFLAG_COPIED L2 cluster: l1_index=%d "
"l1_entry=%" PRIx64 " refcount=%d\n",
fix & BDRV_FIX_ERRORS ? "Repairing" :
"ERROR",
i, l1_entry, refcount);
if (fix & BDRV_FIX_ERRORS) {
s->l1_table[i] = refcount == 1
? l1_entry | QCOW_OFLAG_COPIED
: l1_entry & ~QCOW_OFLAG_COPIED;
ret = qcow2_write_l1_entry(bs, i);
if (ret < 0) {
res->check_errors++;
goto fail;
}
res->corruptions_fixed++;
} else {
res->corruptions++;
}
}
ret = bdrv_pread(bs->file, l2_offset, l2_table,
s->l2_size * sizeof(uint64_t));
if (ret < 0) {
fprintf(stderr, "ERROR: Could not read L2 table: %s\n",
strerror(-ret));
res->check_errors++;
goto fail;
}
for (j = 0; j < s->l2_size; j++) {
uint64_t l2_entry = be64_to_cpu(l2_table[j]);
uint64_t data_offset = l2_entry & L2E_OFFSET_MASK;
int cluster_type = qcow2_get_cluster_type(l2_entry);
if ((cluster_type == QCOW2_CLUSTER_NORMAL) ||
((cluster_type == QCOW2_CLUSTER_ZERO) && (data_offset != 0))) {
refcount = get_refcount(bs, data_offset >> s->cluster_bits);
if (refcount < 0) {
/* don't print message nor increment check_errors */
continue;
}
if ((refcount == 1) != ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
fprintf(stderr, "%s OFLAG_COPIED data cluster: "
"l2_entry=%" PRIx64 " refcount=%d\n",
fix & BDRV_FIX_ERRORS ? "Repairing" :
"ERROR",
l2_entry, refcount);
if (fix & BDRV_FIX_ERRORS) {
l2_table[j] = cpu_to_be64(refcount == 1
? l2_entry | QCOW_OFLAG_COPIED
: l2_entry & ~QCOW_OFLAG_COPIED);
l2_dirty = true;
res->corruptions_fixed++;
} else {
res->corruptions++;
}
}
}
}
if (l2_dirty) {
ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_ACTIVE_L2,
l2_offset, s->cluster_size);
if (ret < 0) {
fprintf(stderr, "ERROR: Could not write L2 table; metadata "
"overlap check failed: %s\n", strerror(-ret));
res->check_errors++;
goto fail;
}
ret = bdrv_pwrite(bs->file, l2_offset, l2_table, s->cluster_size);
if (ret < 0) {
fprintf(stderr, "ERROR: Could not write L2 table: %s\n",
strerror(-ret));
res->check_errors++;
goto fail;
}
}
}
ret = 0;
fail:
qemu_vfree(l2_table);
return ret;
}
/*
* Writes one sector of the refcount table to the disk
*/
#define RT_ENTRIES_PER_SECTOR (512 / sizeof(uint64_t))
static int write_reftable_entry(BlockDriverState *bs, int rt_index)
{
BDRVQcowState *s = bs->opaque;
uint64_t buf[RT_ENTRIES_PER_SECTOR];
int rt_start_index;
int i, ret;
rt_start_index = rt_index & ~(RT_ENTRIES_PER_SECTOR - 1);
for (i = 0; i < RT_ENTRIES_PER_SECTOR; i++) {
buf[i] = cpu_to_be64(s->refcount_table[rt_start_index + i]);
}
ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_REFCOUNT_TABLE,
s->refcount_table_offset + rt_start_index * sizeof(uint64_t),
sizeof(buf));
if (ret < 0) {
return ret;
}
BLKDBG_EVENT(bs->file, BLKDBG_REFTABLE_UPDATE);
ret = bdrv_pwrite_sync(bs->file, s->refcount_table_offset +
rt_start_index * sizeof(uint64_t), buf, sizeof(buf));
if (ret < 0) {
return ret;
}
return 0;
}
/*
* Allocates a new cluster for the given refcount block (represented by its
* offset in the image file) and copies the current content there. This function
* does _not_ decrement the reference count for the currently occupied cluster.
*
* This function prints an informative message to stderr on error (and returns
* -errno); on success, 0 is returned.
*/
static int64_t realloc_refcount_block(BlockDriverState *bs, int reftable_index,
uint64_t offset)
{
BDRVQcowState *s = bs->opaque;
int64_t new_offset = 0;
void *refcount_block = NULL;
int ret;
/* allocate new refcount block */
new_offset = qcow2_alloc_clusters(bs, s->cluster_size);
if (new_offset < 0) {
fprintf(stderr, "Could not allocate new cluster: %s\n",
strerror(-new_offset));
ret = new_offset;
goto fail;
}
/* fetch current refcount block content */
ret = qcow2_cache_get(bs, s->refcount_block_cache, offset, &refcount_block);
if (ret < 0) {
fprintf(stderr, "Could not fetch refcount block: %s\n", strerror(-ret));
goto fail;
}
/* new block has not yet been entered into refcount table, therefore it is
* no refcount block yet (regarding this check) */
ret = qcow2_pre_write_overlap_check(bs, 0, new_offset, s->cluster_size);
if (ret < 0) {
fprintf(stderr, "Could not write refcount block; metadata overlap "
"check failed: %s\n", strerror(-ret));
/* the image will be marked corrupt, so don't even attempt on freeing
* the cluster */
new_offset = 0;
goto fail;
}
/* write to new block */
ret = bdrv_write(bs->file, new_offset / BDRV_SECTOR_SIZE, refcount_block,
s->cluster_sectors);
if (ret < 0) {
fprintf(stderr, "Could not write refcount block: %s\n", strerror(-ret));
goto fail;
}
/* update refcount table */
assert(!(new_offset & (s->cluster_size - 1)));
s->refcount_table[reftable_index] = new_offset;
ret = write_reftable_entry(bs, reftable_index);
if (ret < 0) {
fprintf(stderr, "Could not update refcount table: %s\n",
strerror(-ret));
goto fail;
}
fail:
if (new_offset && (ret < 0)) {
qcow2_free_clusters(bs, new_offset, s->cluster_size,
QCOW2_DISCARD_ALWAYS);
}
if (refcount_block) {
if (ret < 0) {
qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
} else {
ret = qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
}
}
if (ret < 0) {
return ret;
}
return new_offset;
}
/*
* Checks an image for refcount consistency.
*
@ -1257,8 +1479,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
/* current L1 table */
ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
s->l1_table_offset, s->l1_size,
CHECK_OFLAG_COPIED | CHECK_FRAG_INFO);
s->l1_table_offset, s->l1_size, CHECK_FRAG_INFO);
if (ret < 0) {
goto fail;
}
@ -1304,10 +1525,39 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
inc_refcounts(bs, res, refcount_table, nb_clusters,
offset, s->cluster_size);
if (refcount_table[cluster] != 1) {
fprintf(stderr, "ERROR refcount block %" PRId64
fprintf(stderr, "%s refcount block %" PRId64
" refcount=%d\n",
fix & BDRV_FIX_ERRORS ? "Repairing" :
"ERROR",
i, refcount_table[cluster]);
res->corruptions++;
if (fix & BDRV_FIX_ERRORS) {
int64_t new_offset;
new_offset = realloc_refcount_block(bs, i, offset);
if (new_offset < 0) {
res->corruptions++;
continue;
}
/* update refcounts */
if ((new_offset >> s->cluster_bits) >= nb_clusters) {
/* increase refcount_table size if necessary */
int old_nb_clusters = nb_clusters;
nb_clusters = (new_offset >> s->cluster_bits) + 1;
refcount_table = g_realloc(refcount_table,
nb_clusters * sizeof(uint16_t));
memset(&refcount_table[old_nb_clusters], 0, (nb_clusters
- old_nb_clusters) * sizeof(uint16_t));
}
refcount_table[cluster]--;
inc_refcounts(bs, res, refcount_table, nb_clusters,
new_offset, s->cluster_size);
res->corruptions_fixed++;
} else {
res->corruptions++;
}
}
}
}
@ -1363,6 +1613,12 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
}
}
/* check OFLAG_COPIED */
ret = check_oflag_copied(bs, res, fix);
if (ret < 0) {
goto fail;
}
res->image_end_offset = (highest_cluster + 1) * s->cluster_size;
ret = 0;
@ -1372,3 +1628,173 @@ fail:
return ret;
}
#define overlaps_with(ofs, sz) \
ranges_overlap(offset, size, ofs, sz)
/*
* Checks if the given offset into the image file is actually free to use by
* looking for overlaps with important metadata sections (L1/L2 tables etc.),
* i.e. a sanity check without relying on the refcount tables.
*
* The ign parameter specifies what checks not to perform (being a bitmask of
* QCow2MetadataOverlap values), i.e., what sections to ignore.
*
* Returns:
* - 0 if writing to this offset will not affect the mentioned metadata
* - a positive QCow2MetadataOverlap value indicating one overlapping section
* - a negative value (-errno) indicating an error while performing a check,
* e.g. when bdrv_read failed on QCOW2_OL_INACTIVE_L2
*/
int qcow2_check_metadata_overlap(BlockDriverState *bs, int ign, int64_t offset,
int64_t size)
{
BDRVQcowState *s = bs->opaque;
int chk = s->overlap_check & ~ign;
int i, j;
if (!size) {
return 0;
}
if (chk & QCOW2_OL_MAIN_HEADER) {
if (offset < s->cluster_size) {
return QCOW2_OL_MAIN_HEADER;
}
}
/* align range to test to cluster boundaries */
size = align_offset(offset_into_cluster(s, offset) + size, s->cluster_size);
offset = start_of_cluster(s, offset);
if ((chk & QCOW2_OL_ACTIVE_L1) && s->l1_size) {
if (overlaps_with(s->l1_table_offset, s->l1_size * sizeof(uint64_t))) {
return QCOW2_OL_ACTIVE_L1;
}
}
if ((chk & QCOW2_OL_REFCOUNT_TABLE) && s->refcount_table_size) {
if (overlaps_with(s->refcount_table_offset,
s->refcount_table_size * sizeof(uint64_t))) {
return QCOW2_OL_REFCOUNT_TABLE;
}
}
if ((chk & QCOW2_OL_SNAPSHOT_TABLE) && s->snapshots_size) {
if (overlaps_with(s->snapshots_offset, s->snapshots_size)) {
return QCOW2_OL_SNAPSHOT_TABLE;
}
}
if ((chk & QCOW2_OL_INACTIVE_L1) && s->snapshots) {
for (i = 0; i < s->nb_snapshots; i++) {
if (s->snapshots[i].l1_size &&
overlaps_with(s->snapshots[i].l1_table_offset,
s->snapshots[i].l1_size * sizeof(uint64_t))) {
return QCOW2_OL_INACTIVE_L1;
}
}
}
if ((chk & QCOW2_OL_ACTIVE_L2) && s->l1_table) {
for (i = 0; i < s->l1_size; i++) {
if ((s->l1_table[i] & L1E_OFFSET_MASK) &&
overlaps_with(s->l1_table[i] & L1E_OFFSET_MASK,
s->cluster_size)) {
return QCOW2_OL_ACTIVE_L2;
}
}
}
if ((chk & QCOW2_OL_REFCOUNT_BLOCK) && s->refcount_table) {
for (i = 0; i < s->refcount_table_size; i++) {
if ((s->refcount_table[i] & REFT_OFFSET_MASK) &&
overlaps_with(s->refcount_table[i] & REFT_OFFSET_MASK,
s->cluster_size)) {
return QCOW2_OL_REFCOUNT_BLOCK;
}
}
}
if ((chk & QCOW2_OL_INACTIVE_L2) && s->snapshots) {
for (i = 0; i < s->nb_snapshots; i++) {
uint64_t l1_ofs = s->snapshots[i].l1_table_offset;
uint32_t l1_sz = s->snapshots[i].l1_size;
uint64_t l1_sz2 = l1_sz * sizeof(uint64_t);
uint64_t *l1 = g_malloc(l1_sz2);
int ret;
ret = bdrv_pread(bs->file, l1_ofs, l1, l1_sz2);
if (ret < 0) {
g_free(l1);
return ret;
}
for (j = 0; j < l1_sz; j++) {
uint64_t l2_ofs = be64_to_cpu(l1[j]) & L1E_OFFSET_MASK;
if (l2_ofs && overlaps_with(l2_ofs, s->cluster_size)) {
g_free(l1);
return QCOW2_OL_INACTIVE_L2;
}
}
g_free(l1);
}
}
return 0;
}
static const char *metadata_ol_names[] = {
[QCOW2_OL_MAIN_HEADER_BITNR] = "qcow2_header",
[QCOW2_OL_ACTIVE_L1_BITNR] = "active L1 table",
[QCOW2_OL_ACTIVE_L2_BITNR] = "active L2 table",
[QCOW2_OL_REFCOUNT_TABLE_BITNR] = "refcount table",
[QCOW2_OL_REFCOUNT_BLOCK_BITNR] = "refcount block",
[QCOW2_OL_SNAPSHOT_TABLE_BITNR] = "snapshot table",
[QCOW2_OL_INACTIVE_L1_BITNR] = "inactive L1 table",
[QCOW2_OL_INACTIVE_L2_BITNR] = "inactive L2 table",
};
/*
* First performs a check for metadata overlaps (through
* qcow2_check_metadata_overlap); if that fails with a negative value (error
* while performing a check), that value is returned. If an impending overlap
* is detected, the BDS will be made unusable, the qcow2 file marked corrupt
* and -EIO returned.
*
* Returns 0 if there were neither overlaps nor errors while checking for
* overlaps; or a negative value (-errno) on error.
*/
int qcow2_pre_write_overlap_check(BlockDriverState *bs, int ign, int64_t offset,
int64_t size)
{
int ret = qcow2_check_metadata_overlap(bs, ign, offset, size);
if (ret < 0) {
return ret;
} else if (ret > 0) {
int metadata_ol_bitnr = ffs(ret) - 1;
char *message;
QObject *data;
assert(metadata_ol_bitnr < QCOW2_OL_MAX_BITNR);
fprintf(stderr, "qcow2: Preventing invalid write on metadata (overlaps "
"with %s); image marked as corrupt.\n",
metadata_ol_names[metadata_ol_bitnr]);
message = g_strdup_printf("Prevented %s overwrite",
metadata_ol_names[metadata_ol_bitnr]);
data = qobject_from_jsonf("{ 'device': %s, 'msg': %s, 'offset': %"
PRId64 ", 'size': %" PRId64 " }", bs->device_name, message,
offset, size);
monitor_protocol_event(QEVENT_BLOCK_IMAGE_CORRUPTED, data);
g_free(message);
qobject_decref(data);
qcow2_mark_corrupt(bs);
bs->drv = NULL; /* make BDS unusable */
return -EIO;
}
return 0;
}

View File

@ -182,13 +182,22 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
snapshots_offset = qcow2_alloc_clusters(bs, snapshots_size);
offset = snapshots_offset;
if (offset < 0) {
return offset;
ret = offset;
goto fail;
}
ret = bdrv_flush(bs);
if (ret < 0) {
return ret;
goto fail;
}
/* The snapshot list position has not yet been updated, so these clusters
* must indeed be completely free */
ret = qcow2_pre_write_overlap_check(bs, 0, offset, snapshots_size);
if (ret < 0) {
goto fail;
}
/* Write all snapshots to the new list */
for(i = 0; i < s->nb_snapshots; i++) {
sn = s->snapshots + i;
@ -211,6 +220,7 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
id_str_size = strlen(sn->id_str);
name_size = strlen(sn->name);
assert(id_str_size <= UINT16_MAX && name_size <= UINT16_MAX);
h.id_str_size = cpu_to_be16(id_str_size);
h.name_size = cpu_to_be16(name_size);
offset = align_offset(offset, 8);
@ -269,6 +279,10 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
return 0;
fail:
if (snapshots_offset > 0) {
qcow2_free_clusters(bs, snapshots_offset, snapshots_size,
QCOW2_DISCARD_ALWAYS);
}
return ret;
}
@ -277,7 +291,8 @@ static void find_new_snapshot_id(BlockDriverState *bs,
{
BDRVQcowState *s = bs->opaque;
QCowSnapshot *sn;
int i, id, id_max = 0;
int i;
unsigned long id, id_max = 0;
for(i = 0; i < s->nb_snapshots; i++) {
sn = s->snapshots + i;
@ -285,34 +300,50 @@ static void find_new_snapshot_id(BlockDriverState *bs,
if (id > id_max)
id_max = id;
}
snprintf(id_str, id_str_size, "%d", id_max + 1);
snprintf(id_str, id_str_size, "%lu", id_max + 1);
}
static int find_snapshot_by_id(BlockDriverState *bs, const char *id_str)
static int find_snapshot_by_id_and_name(BlockDriverState *bs,
const char *id,
const char *name)
{
BDRVQcowState *s = bs->opaque;
int i;
for(i = 0; i < s->nb_snapshots; i++) {
if (!strcmp(s->snapshots[i].id_str, id_str))
return i;
if (id && name) {
for (i = 0; i < s->nb_snapshots; i++) {
if (!strcmp(s->snapshots[i].id_str, id) &&
!strcmp(s->snapshots[i].name, name)) {
return i;
}
}
} else if (id) {
for (i = 0; i < s->nb_snapshots; i++) {
if (!strcmp(s->snapshots[i].id_str, id)) {
return i;
}
}
} else if (name) {
for (i = 0; i < s->nb_snapshots; i++) {
if (!strcmp(s->snapshots[i].name, name)) {
return i;
}
}
}
return -1;
}
static int find_snapshot_by_id_or_name(BlockDriverState *bs, const char *name)
static int find_snapshot_by_id_or_name(BlockDriverState *bs,
const char *id_or_name)
{
BDRVQcowState *s = bs->opaque;
int i, ret;
int ret;
ret = find_snapshot_by_id(bs, name);
if (ret >= 0)
ret = find_snapshot_by_id_and_name(bs, id_or_name, NULL);
if (ret >= 0) {
return ret;
for(i = 0; i < s->nb_snapshots; i++) {
if (!strcmp(s->snapshots[i].name, name))
return i;
}
return -1;
return find_snapshot_by_id_and_name(bs, NULL, id_or_name);
}
/* if no id is provided, a new one is constructed */
@ -334,7 +365,7 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
}
/* Check that the ID is unique */
if (find_snapshot_by_id(bs, sn_info->id_str) >= 0) {
if (find_snapshot_by_id_and_name(bs, sn_info->id_str, NULL) >= 0) {
return -EEXIST;
}
@ -363,6 +394,12 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
l1_table[i] = cpu_to_be64(s->l1_table[i]);
}
ret = qcow2_pre_write_overlap_check(bs, 0, sn->l1_table_offset,
s->l1_size * sizeof(uint64_t));
if (ret < 0) {
goto fail;
}
ret = bdrv_pwrite(bs->file, sn->l1_table_offset, l1_table,
s->l1_size * sizeof(uint64_t));
if (ret < 0) {
@ -396,11 +433,19 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
if (ret < 0) {
g_free(s->snapshots);
s->snapshots = old_snapshot_list;
s->nb_snapshots--;
goto fail;
}
g_free(old_snapshot_list);
/* The VM state isn't needed any more in the active L1 table; in fact, it
* hurts by causing expensive COW for the next snapshot. */
qcow2_discard_clusters(bs, qcow2_vm_state_offset(s),
align_offset(sn->vm_state_size, s->cluster_size)
>> BDRV_SECTOR_BITS,
QCOW2_DISCARD_NEVER);
#ifdef DEBUG_ALLOC
{
BdrvCheckResult result = {0};
@ -475,6 +520,12 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id)
goto fail;
}
ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_ACTIVE_L1,
s->l1_table_offset, cur_l1_bytes);
if (ret < 0) {
goto fail;
}
ret = bdrv_pwrite_sync(bs->file, s->l1_table_offset, sn_l1_table,
cur_l1_bytes);
if (ret < 0) {
@ -531,15 +582,19 @@ fail:
return ret;
}
int qcow2_snapshot_delete(BlockDriverState *bs, const char *snapshot_id)
int qcow2_snapshot_delete(BlockDriverState *bs,
const char *snapshot_id,
const char *name,
Error **errp)
{
BDRVQcowState *s = bs->opaque;
QCowSnapshot sn;
int snapshot_index, ret;
/* Search the snapshot */
snapshot_index = find_snapshot_by_id_or_name(bs, snapshot_id);
snapshot_index = find_snapshot_by_id_and_name(bs, snapshot_id, name);
if (snapshot_index < 0) {
error_setg(errp, "Can't find the snapshot");
return -ENOENT;
}
sn = s->snapshots[snapshot_index];
@ -551,6 +606,7 @@ int qcow2_snapshot_delete(BlockDriverState *bs, const char *snapshot_id)
s->nb_snapshots--;
ret = qcow2_write_snapshots(bs);
if (ret < 0) {
error_setg(errp, "Failed to remove snapshot from snapshot list");
return ret;
}
@ -568,6 +624,7 @@ int qcow2_snapshot_delete(BlockDriverState *bs, const char *snapshot_id)
ret = qcow2_update_snapshot_refcount(bs, sn.l1_table_offset,
sn.l1_size, -1);
if (ret < 0) {
error_setg(errp, "Failed to free the cluster and L1 table");
return ret;
}
qcow2_free_clusters(bs, sn.l1_table_offset, sn.l1_size * sizeof(uint64_t),
@ -576,6 +633,7 @@ int qcow2_snapshot_delete(BlockDriverState *bs, const char *snapshot_id)
/* must update the copied flag on the current cluster offsets */
ret = qcow2_update_snapshot_refcount(bs, s->l1_table_offset, s->l1_size, 0);
if (ret < 0) {
error_setg(errp, "Failed to update snapshot status in disk");
return ret;
}

File diff suppressed because it is too large Load Diff

View File

@ -40,11 +40,11 @@
#define QCOW_MAX_CRYPT_CLUSTERS 32
/* indicate that the refcount of the referenced cluster is exactly one. */
#define QCOW_OFLAG_COPIED (1LL << 63)
#define QCOW_OFLAG_COPIED (1ULL << 63)
/* indicate that the cluster is compressed (they never have the copied flag) */
#define QCOW_OFLAG_COMPRESSED (1LL << 62)
#define QCOW_OFLAG_COMPRESSED (1ULL << 62)
/* The cluster reads as all zeros */
#define QCOW_OFLAG_ZERO (1LL << 0)
#define QCOW_OFLAG_ZERO (1ULL << 0)
#define REFCOUNT_SHIFT 1 /* refcount size is 2 bytes */
@ -63,6 +63,15 @@
#define QCOW2_OPT_DISCARD_REQUEST "pass-discard-request"
#define QCOW2_OPT_DISCARD_SNAPSHOT "pass-discard-snapshot"
#define QCOW2_OPT_DISCARD_OTHER "pass-discard-other"
#define QCOW2_OPT_OVERLAP "overlap-check"
#define QCOW2_OPT_OVERLAP_MAIN_HEADER "overlap-check.main-header"
#define QCOW2_OPT_OVERLAP_ACTIVE_L1 "overlap-check.active-l1"
#define QCOW2_OPT_OVERLAP_ACTIVE_L2 "overlap-check.active-l2"
#define QCOW2_OPT_OVERLAP_REFCOUNT_TABLE "overlap-check.refcount-table"
#define QCOW2_OPT_OVERLAP_REFCOUNT_BLOCK "overlap-check.refcount-block"
#define QCOW2_OPT_OVERLAP_SNAPSHOT_TABLE "overlap-check.snapshot-table"
#define QCOW2_OPT_OVERLAP_INACTIVE_L1 "overlap-check.inactive-l1"
#define QCOW2_OPT_OVERLAP_INACTIVE_L2 "overlap-check.inactive-l2"
typedef struct QCowHeader {
uint32_t magic;
@ -86,7 +95,7 @@ typedef struct QCowHeader {
uint32_t refcount_order;
uint32_t header_length;
} QCowHeader;
} QEMU_PACKED QCowHeader;
typedef struct QCowSnapshot {
uint64_t l1_table_offset;
@ -119,9 +128,12 @@ enum {
/* Incompatible feature bits */
enum {
QCOW2_INCOMPAT_DIRTY_BITNR = 0,
QCOW2_INCOMPAT_CORRUPT_BITNR = 1,
QCOW2_INCOMPAT_DIRTY = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
QCOW2_INCOMPAT_CORRUPT = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
QCOW2_INCOMPAT_MASK = QCOW2_INCOMPAT_DIRTY,
QCOW2_INCOMPAT_MASK = QCOW2_INCOMPAT_DIRTY
| QCOW2_INCOMPAT_CORRUPT,
};
/* Compatible feature bits */
@ -196,9 +208,12 @@ typedef struct BDRVQcowState {
int flags;
int qcow_version;
bool use_lazy_refcounts;
int refcount_order;
bool discard_passthrough[QCOW2_DISCARD_MAX];
int overlap_check; /* bitmask of Qcow2MetadataOverlap values */
uint64_t incompatible_features;
uint64_t compatible_features;
uint64_t autoclear_features;
@ -286,6 +301,45 @@ enum {
QCOW2_CLUSTER_ZERO
};
typedef enum QCow2MetadataOverlap {
QCOW2_OL_MAIN_HEADER_BITNR = 0,
QCOW2_OL_ACTIVE_L1_BITNR = 1,
QCOW2_OL_ACTIVE_L2_BITNR = 2,
QCOW2_OL_REFCOUNT_TABLE_BITNR = 3,
QCOW2_OL_REFCOUNT_BLOCK_BITNR = 4,
QCOW2_OL_SNAPSHOT_TABLE_BITNR = 5,
QCOW2_OL_INACTIVE_L1_BITNR = 6,
QCOW2_OL_INACTIVE_L2_BITNR = 7,
QCOW2_OL_MAX_BITNR = 8,
QCOW2_OL_NONE = 0,
QCOW2_OL_MAIN_HEADER = (1 << QCOW2_OL_MAIN_HEADER_BITNR),
QCOW2_OL_ACTIVE_L1 = (1 << QCOW2_OL_ACTIVE_L1_BITNR),
QCOW2_OL_ACTIVE_L2 = (1 << QCOW2_OL_ACTIVE_L2_BITNR),
QCOW2_OL_REFCOUNT_TABLE = (1 << QCOW2_OL_REFCOUNT_TABLE_BITNR),
QCOW2_OL_REFCOUNT_BLOCK = (1 << QCOW2_OL_REFCOUNT_BLOCK_BITNR),
QCOW2_OL_SNAPSHOT_TABLE = (1 << QCOW2_OL_SNAPSHOT_TABLE_BITNR),
QCOW2_OL_INACTIVE_L1 = (1 << QCOW2_OL_INACTIVE_L1_BITNR),
/* NOTE: Checking overlaps with inactive L2 tables will result in bdrv
* reads. */
QCOW2_OL_INACTIVE_L2 = (1 << QCOW2_OL_INACTIVE_L2_BITNR),
} QCow2MetadataOverlap;
/* Perform all overlap checks which can be done in constant time */
#define QCOW2_OL_CONSTANT \
(QCOW2_OL_MAIN_HEADER | QCOW2_OL_ACTIVE_L1 | QCOW2_OL_REFCOUNT_TABLE | \
QCOW2_OL_SNAPSHOT_TABLE)
/* Perform all overlap checks which don't require disk access */
#define QCOW2_OL_CACHED \
(QCOW2_OL_CONSTANT | QCOW2_OL_ACTIVE_L2 | QCOW2_OL_REFCOUNT_BLOCK | \
QCOW2_OL_INACTIVE_L1)
/* Perform all overlap checks */
#define QCOW2_OL_ALL \
(QCOW2_OL_CACHED | QCOW2_OL_INACTIVE_L2)
#define L1E_OFFSET_MASK 0x00ffffffffffff00ULL
#define L2E_OFFSET_MASK 0x00ffffffffffff00ULL
#define L2E_COMPRESSED_OFFSET_SIZE_MASK 0x3fffffffffffffffULL
@ -324,6 +378,11 @@ static inline int64_t align_offset(int64_t offset, int n)
return offset;
}
static inline int64_t qcow2_vm_state_offset(BDRVQcowState *s)
{
return (int64_t)s->l1_vm_state_index << (s->cluster_bits + s->l2_bits);
}
static inline int qcow2_get_cluster_type(uint64_t l2_entry)
{
if (l2_entry & QCOW_OFLAG_COMPRESSED) {
@ -361,12 +420,17 @@ int qcow2_backing_read1(BlockDriverState *bs, QEMUIOVector *qiov,
int64_t sector_num, int nb_sectors);
int qcow2_mark_dirty(BlockDriverState *bs);
int qcow2_mark_corrupt(BlockDriverState *bs);
int qcow2_mark_consistent(BlockDriverState *bs);
int qcow2_update_header(BlockDriverState *bs);
/* qcow2-refcount.c functions */
int qcow2_refcount_init(BlockDriverState *bs);
void qcow2_refcount_close(BlockDriverState *bs);
int qcow2_update_cluster_refcount(BlockDriverState *bs, int64_t cluster_index,
int addend, enum qcow2_discard_type type);
int64_t qcow2_alloc_clusters(BlockDriverState *bs, int64_t size);
int qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offset,
int nb_clusters);
@ -385,9 +449,15 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
void qcow2_process_discards(BlockDriverState *bs, int ret);
int qcow2_check_metadata_overlap(BlockDriverState *bs, int ign, int64_t offset,
int64_t size);
int qcow2_pre_write_overlap_check(BlockDriverState *bs, int ign, int64_t offset,
int64_t size);
/* qcow2-cluster.c functions */
int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
bool exact_size);
int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index);
void qcow2_l2_cache_reset(BlockDriverState *bs);
int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset);
void qcow2_encrypt_sectors(BDRVQcowState *s, int64_t sector_num,
@ -405,13 +475,18 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m);
int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
int nb_sectors);
int nb_sectors, enum qcow2_discard_type type);
int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors);
int qcow2_expand_zero_clusters(BlockDriverState *bs);
/* qcow2-snapshot.c functions */
int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info);
int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id);
int qcow2_snapshot_delete(BlockDriverState *bs, const char *snapshot_id);
int qcow2_snapshot_delete(BlockDriverState *bs,
const char *snapshot_id,
const char *name,
Error **errp);
int qcow2_snapshot_list(BlockDriverState *bs, QEMUSnapshotInfo **psn_tab);
int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name);
@ -428,6 +503,8 @@ int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
Qcow2Cache *dependency);
void qcow2_cache_depends_on_flush(Qcow2Cache *c);
int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c);
int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
void **table);
int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,

View File

@ -353,10 +353,10 @@ static void qed_start_need_check_timer(BDRVQEDState *s)
{
trace_qed_start_need_check_timer(s);
/* Use vm_clock so we don't alter the image file while suspended for
/* Use QEMU_CLOCK_VIRTUAL so we don't alter the image file while suspended for
* migration.
*/
qemu_mod_timer(s->need_check_timer, qemu_get_clock_ns(vm_clock) +
timer_mod(s->need_check_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
get_ticks_per_sec() * QED_NEED_CHECK_TIMEOUT);
}
@ -364,7 +364,7 @@ static void qed_start_need_check_timer(BDRVQEDState *s)
static void qed_cancel_need_check_timer(BDRVQEDState *s)
{
trace_qed_cancel_need_check_timer(s);
qemu_del_timer(s->need_check_timer);
timer_del(s->need_check_timer);
}
static void bdrv_qed_rebind(BlockDriverState *bs)
@ -373,7 +373,8 @@ static void bdrv_qed_rebind(BlockDriverState *bs)
s->bs = bs;
}
static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags)
static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVQEDState *s = bs->opaque;
QEDHeader le_header;
@ -494,7 +495,7 @@ static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags)
}
}
s->need_check_timer = qemu_new_timer_ns(vm_clock,
s->need_check_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
qed_need_check_timer_cb, s);
out:
@ -518,7 +519,7 @@ static void bdrv_qed_close(BlockDriverState *bs)
BDRVQEDState *s = bs->opaque;
qed_cancel_need_check_timer(s);
qemu_free_timer(s->need_check_timer);
timer_free(s->need_check_timer);
/* Ensure writes reach stable storage */
bdrv_flush(bs->file);
@ -550,16 +551,22 @@ static int qed_create(const char *filename, uint32_t cluster_size,
QEDHeader le_header;
uint8_t *l1_table = NULL;
size_t l1_size = header.cluster_size * header.table_size;
Error *local_err = NULL;
int ret = 0;
BlockDriverState *bs = NULL;
ret = bdrv_create_file(filename, NULL);
ret = bdrv_create_file(filename, NULL, &local_err);
if (ret < 0) {
qerror_report_err(local_err);
error_free(local_err);
return ret;
}
ret = bdrv_file_open(&bs, filename, NULL, BDRV_O_RDWR | BDRV_O_CACHE_WB);
ret = bdrv_file_open(&bs, filename, NULL, BDRV_O_RDWR | BDRV_O_CACHE_WB,
&local_err);
if (ret < 0) {
qerror_report_err(local_err);
error_free(local_err);
return ret;
}
@ -599,11 +606,12 @@ static int qed_create(const char *filename, uint32_t cluster_size,
ret = 0; /* success */
out:
g_free(l1_table);
bdrv_delete(bs);
bdrv_unref(bs);
return ret;
}
static int bdrv_qed_create(const char *filename, QEMUOptionParameter *options)
static int bdrv_qed_create(const char *filename, QEMUOptionParameter *options,
Error **errp)
{
uint64_t image_size = 0;
uint32_t cluster_size = QED_DEFAULT_CLUSTER_SIZE;
@ -652,45 +660,66 @@ static int bdrv_qed_create(const char *filename, QEMUOptionParameter *options)
}
typedef struct {
BlockDriverState *bs;
Coroutine *co;
int is_allocated;
uint64_t pos;
int64_t status;
int *pnum;
} QEDIsAllocatedCB;
static void qed_is_allocated_cb(void *opaque, int ret, uint64_t offset, size_t len)
{
QEDIsAllocatedCB *cb = opaque;
BDRVQEDState *s = cb->bs->opaque;
*cb->pnum = len / BDRV_SECTOR_SIZE;
cb->is_allocated = (ret == QED_CLUSTER_FOUND || ret == QED_CLUSTER_ZERO);
switch (ret) {
case QED_CLUSTER_FOUND:
offset |= qed_offset_into_cluster(s, cb->pos);
cb->status = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
break;
case QED_CLUSTER_ZERO:
cb->status = BDRV_BLOCK_ZERO;
break;
case QED_CLUSTER_L2:
case QED_CLUSTER_L1:
cb->status = 0;
break;
default:
assert(ret < 0);
cb->status = ret;
break;
}
if (cb->co) {
qemu_coroutine_enter(cb->co, NULL);
}
}
static int coroutine_fn bdrv_qed_co_is_allocated(BlockDriverState *bs,
static int64_t coroutine_fn bdrv_qed_co_get_block_status(BlockDriverState *bs,
int64_t sector_num,
int nb_sectors, int *pnum)
{
BDRVQEDState *s = bs->opaque;
uint64_t pos = (uint64_t)sector_num * BDRV_SECTOR_SIZE;
size_t len = (size_t)nb_sectors * BDRV_SECTOR_SIZE;
QEDIsAllocatedCB cb = {
.is_allocated = -1,
.bs = bs,
.pos = (uint64_t)sector_num * BDRV_SECTOR_SIZE,
.status = BDRV_BLOCK_OFFSET_MASK,
.pnum = pnum,
};
QEDRequest request = { .l2_table = NULL };
qed_find_cluster(s, &request, pos, len, qed_is_allocated_cb, &cb);
qed_find_cluster(s, &request, cb.pos, len, qed_is_allocated_cb, &cb);
/* Now sleep if the callback wasn't invoked immediately */
while (cb.is_allocated == -1) {
while (cb.status == BDRV_BLOCK_OFFSET_MASK) {
cb.co = qemu_coroutine_self();
qemu_coroutine_yield();
}
qed_unref_l2_cache_entry(request.l2_table);
return cb.is_allocated;
return cb.status;
}
static int bdrv_qed_make_empty(BlockDriverState *bs)
@ -1526,7 +1555,7 @@ static void bdrv_qed_invalidate_cache(BlockDriverState *bs)
bdrv_qed_close(bs);
memset(s, 0, sizeof(BDRVQEDState));
bdrv_qed_open(bs, NULL, bs->open_flags);
bdrv_qed_open(bs, NULL, bs->open_flags, NULL);
}
static int bdrv_qed_check(BlockDriverState *bs, BdrvCheckResult *result,
@ -1575,7 +1604,7 @@ static BlockDriver bdrv_qed = {
.bdrv_reopen_prepare = bdrv_qed_reopen_prepare,
.bdrv_create = bdrv_qed_create,
.bdrv_has_zero_init = bdrv_has_zero_init_1,
.bdrv_co_is_allocated = bdrv_qed_co_is_allocated,
.bdrv_co_get_block_status = bdrv_qed_co_get_block_status,
.bdrv_make_empty = bdrv_qed_make_empty,
.bdrv_aio_readv = bdrv_qed_aio_readv,
.bdrv_aio_writev = bdrv_qed_aio_writev,

View File

@ -100,7 +100,7 @@ typedef struct {
/* if (features & QED_F_BACKING_FILE) */
uint32_t backing_filename_offset; /* in bytes from start of header */
uint32_t backing_filename_size; /* in bytes */
} QEDHeader;
} QEMU_PACKED QEDHeader;
typedef struct {
uint64_t offsets[0]; /* in bytes */

View File

@ -276,7 +276,7 @@ static QemuOptsList raw_runtime_opts = {
};
static int raw_open_common(BlockDriverState *bs, QDict *options,
int bdrv_flags, int open_flags)
int bdrv_flags, int open_flags, Error **errp)
{
BDRVRawState *s = bs->opaque;
QemuOpts *opts;
@ -287,8 +287,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
opts = qemu_opts_create_nofail(&raw_runtime_opts);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (error_is_set(&local_err)) {
qerror_report_err(local_err);
error_free(local_err);
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
@ -297,6 +296,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
ret = raw_normalize_devicepath(&filename);
if (ret != 0) {
error_setg_errno(errp, -ret, "Could not normalize device path");
goto fail;
}
@ -318,6 +318,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
if (raw_set_aio(&s->aio_ctx, &s->use_aio, bdrv_flags)) {
qemu_close(fd);
ret = -errno;
error_setg_errno(errp, -ret, "Could not set AIO state");
goto fail;
}
#endif
@ -335,12 +336,19 @@ fail:
return ret;
}
static int raw_open(BlockDriverState *bs, QDict *options, int flags)
static int raw_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVRawState *s = bs->opaque;
Error *local_err = NULL;
int ret;
s->type = FTYPE_FILE;
return raw_open_common(bs, options, flags, 0);
ret = raw_open_common(bs, options, flags, 0, &local_err);
if (error_is_set(&local_err)) {
error_propagate(errp, local_err);
}
return ret;
}
static int raw_reopen_prepare(BDRVReopenState *state,
@ -365,6 +373,7 @@ static int raw_reopen_prepare(BDRVReopenState *state,
* valid in the 'false' condition even if aio_ctx is set, and raw_set_aio()
* won't override aio_ctx if aio_ctx is non-NULL */
if (raw_set_aio(&s->aio_ctx, &raw_s->use_aio, state->flags)) {
error_setg(errp, "Could not set AIO state");
return -1;
}
#endif
@ -416,6 +425,7 @@ static int raw_reopen_prepare(BDRVReopenState *state,
assert(!(raw_s->open_flags & O_CREAT));
raw_s->fd = qemu_open(state->bs->filename, raw_s->open_flags);
if (raw_s->fd == -1) {
error_setg_errno(errp, errno, "Could not reopen file");
ret = -1;
}
}
@ -1040,7 +1050,8 @@ static int64_t raw_get_allocated_file_size(BlockDriverState *bs)
return (int64_t)st.st_blocks * 512;
}
static int raw_create(const char *filename, QEMUOptionParameter *options)
static int raw_create(const char *filename, QEMUOptionParameter *options,
Error **errp)
{
int fd;
int result = 0;
@ -1058,12 +1069,15 @@ static int raw_create(const char *filename, QEMUOptionParameter *options)
0644);
if (fd < 0) {
result = -errno;
error_setg_errno(errp, -result, "Could not create file");
} else {
if (ftruncate(fd, total_size * BDRV_SECTOR_SIZE) != 0) {
result = -errno;
error_setg_errno(errp, -result, "Could not resize file");
}
if (qemu_close(fd) != 0) {
result = -errno;
error_setg_errno(errp, -result, "Could not close the new file");
}
}
return result;
@ -1084,12 +1098,12 @@ static int raw_create(const char *filename, QEMUOptionParameter *options)
* 'nb_sectors' is the max value 'pnum' should be set to. If nb_sectors goes
* beyond the end of the disk image it will be clamped.
*/
static int coroutine_fn raw_co_is_allocated(BlockDriverState *bs,
static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
int64_t sector_num,
int nb_sectors, int *pnum)
{
off_t start, data, hole;
int ret;
int64_t ret;
ret = fd_open(bs);
if (ret < 0) {
@ -1097,6 +1111,7 @@ static int coroutine_fn raw_co_is_allocated(BlockDriverState *bs,
}
start = sector_num * BDRV_SECTOR_SIZE;
ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
#ifdef CONFIG_FIEMAP
@ -1114,7 +1129,7 @@ static int coroutine_fn raw_co_is_allocated(BlockDriverState *bs,
if (ioctl(s->fd, FS_IOC_FIEMAP, &f) == -1) {
/* Assume everything is allocated. */
*pnum = nb_sectors;
return 1;
return ret;
}
if (f.fm.fm_mapped_extents == 0) {
@ -1127,6 +1142,9 @@ static int coroutine_fn raw_co_is_allocated(BlockDriverState *bs,
} else {
data = f.fe.fe_logical;
hole = f.fe.fe_logical + f.fe.fe_length;
if (f.fe.fe_flags & FIEMAP_EXTENT_UNWRITTEN) {
ret |= BDRV_BLOCK_ZERO;
}
}
#elif defined SEEK_HOLE && defined SEEK_DATA
@ -1141,7 +1159,7 @@ static int coroutine_fn raw_co_is_allocated(BlockDriverState *bs,
/* Most likely EINVAL. Assume everything is allocated. */
*pnum = nb_sectors;
return 1;
return ret;
}
if (hole > start) {
@ -1154,19 +1172,21 @@ static int coroutine_fn raw_co_is_allocated(BlockDriverState *bs,
}
}
#else
*pnum = nb_sectors;
return 1;
data = 0;
hole = start + nb_sectors * BDRV_SECTOR_SIZE;
#endif
if (data <= start) {
/* On a data extent, compute sectors to the end of the extent. */
*pnum = MIN(nb_sectors, (hole - start) / BDRV_SECTOR_SIZE);
return 1;
} else {
/* On a hole, compute sectors to the beginning of the next extent. */
*pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
return 0;
ret &= ~BDRV_BLOCK_DATA;
ret |= BDRV_BLOCK_ZERO;
}
return ret;
}
static coroutine_fn BlockDriverAIOCB *raw_aio_discard(BlockDriverState *bs,
@ -1192,6 +1212,7 @@ static BlockDriver bdrv_file = {
.format_name = "file",
.protocol_name = "file",
.instance_size = sizeof(BDRVRawState),
.bdrv_needs_filename = true,
.bdrv_probe = NULL, /* no probe for protocols */
.bdrv_file_open = raw_open,
.bdrv_reopen_prepare = raw_reopen_prepare,
@ -1200,7 +1221,7 @@ static BlockDriver bdrv_file = {
.bdrv_close = raw_close,
.bdrv_create = raw_create,
.bdrv_has_zero_init = bdrv_has_zero_init_1,
.bdrv_co_is_allocated = raw_co_is_allocated,
.bdrv_co_get_block_status = raw_co_get_block_status,
.bdrv_aio_readv = raw_aio_readv,
.bdrv_aio_writev = raw_aio_writev,
@ -1325,9 +1346,11 @@ static int check_hdev_writable(BDRVRawState *s)
return 0;
}
static int hdev_open(BlockDriverState *bs, QDict *options, int flags)
static int hdev_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVRawState *s = bs->opaque;
Error *local_err = NULL;
int ret;
const char *filename = qdict_get_str(options, "filename");
@ -1371,8 +1394,11 @@ static int hdev_open(BlockDriverState *bs, QDict *options, int flags)
}
#endif
ret = raw_open_common(bs, options, flags, 0);
ret = raw_open_common(bs, options, flags, 0, &local_err);
if (ret < 0) {
if (error_is_set(&local_err)) {
error_propagate(errp, local_err);
}
return ret;
}
@ -1380,6 +1406,7 @@ static int hdev_open(BlockDriverState *bs, QDict *options, int flags)
ret = check_hdev_writable(s);
if (ret < 0) {
raw_close(bs);
error_setg_errno(errp, -ret, "The device is not writable");
return ret;
}
}
@ -1498,7 +1525,8 @@ static coroutine_fn BlockDriverAIOCB *hdev_aio_discard(BlockDriverState *bs,
cb, opaque, QEMU_AIO_DISCARD|QEMU_AIO_BLKDEV);
}
static int hdev_create(const char *filename, QEMUOptionParameter *options)
static int hdev_create(const char *filename, QEMUOptionParameter *options,
Error **errp)
{
int fd;
int ret = 0;
@ -1514,15 +1542,23 @@ static int hdev_create(const char *filename, QEMUOptionParameter *options)
}
fd = qemu_open(filename, O_WRONLY | O_BINARY);
if (fd < 0)
return -errno;
if (fstat(fd, &stat_buf) < 0)
if (fd < 0) {
ret = -errno;
else if (!S_ISBLK(stat_buf.st_mode) && !S_ISCHR(stat_buf.st_mode))
error_setg_errno(errp, -ret, "Could not open device");
return ret;
}
if (fstat(fd, &stat_buf) < 0) {
ret = -errno;
error_setg_errno(errp, -ret, "Could not stat device");
} else if (!S_ISBLK(stat_buf.st_mode) && !S_ISCHR(stat_buf.st_mode)) {
error_setg(errp,
"The given file is neither a block nor a character device");
ret = -ENODEV;
else if (lseek(fd, 0, SEEK_END) < total_size * BDRV_SECTOR_SIZE)
} else if (lseek(fd, 0, SEEK_END) < total_size * BDRV_SECTOR_SIZE) {
error_setg(errp, "Device is too small");
ret = -ENOSPC;
}
qemu_close(fd);
return ret;
@ -1532,6 +1568,7 @@ static BlockDriver bdrv_host_device = {
.format_name = "host_device",
.protocol_name = "host_device",
.instance_size = sizeof(BDRVRawState),
.bdrv_needs_filename = true,
.bdrv_probe_device = hdev_probe_device,
.bdrv_file_open = hdev_open,
.bdrv_close = raw_close,
@ -1559,17 +1596,23 @@ static BlockDriver bdrv_host_device = {
};
#ifdef __linux__
static int floppy_open(BlockDriverState *bs, QDict *options, int flags)
static int floppy_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVRawState *s = bs->opaque;
Error *local_err = NULL;
int ret;
s->type = FTYPE_FD;
/* open will not fail even if no floppy is inserted, so add O_NONBLOCK */
ret = raw_open_common(bs, options, flags, O_NONBLOCK);
if (ret)
ret = raw_open_common(bs, options, flags, O_NONBLOCK, &local_err);
if (ret) {
if (error_is_set(&local_err)) {
error_propagate(errp, local_err);
}
return ret;
}
/* close fd so that we can reopen it as needed */
qemu_close(s->fd);
@ -1656,6 +1699,7 @@ static BlockDriver bdrv_host_floppy = {
.format_name = "host_floppy",
.protocol_name = "host_floppy",
.instance_size = sizeof(BDRVRawState),
.bdrv_needs_filename = true,
.bdrv_probe_device = floppy_probe_device,
.bdrv_file_open = floppy_open,
.bdrv_close = raw_close,
@ -1670,7 +1714,8 @@ static BlockDriver bdrv_host_floppy = {
.bdrv_aio_flush = raw_aio_flush,
.bdrv_truncate = raw_truncate,
.bdrv_getlength = raw_getlength,
.bdrv_getlength = raw_getlength,
.has_variable_length = true,
.bdrv_get_allocated_file_size
= raw_get_allocated_file_size,
@ -1680,14 +1725,21 @@ static BlockDriver bdrv_host_floppy = {
.bdrv_eject = floppy_eject,
};
static int cdrom_open(BlockDriverState *bs, QDict *options, int flags)
static int cdrom_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVRawState *s = bs->opaque;
Error *local_err = NULL;
int ret;
s->type = FTYPE_CD;
/* open will not fail even if no CD is inserted, so add O_NONBLOCK */
return raw_open_common(bs, options, flags, O_NONBLOCK);
ret = raw_open_common(bs, options, flags, O_NONBLOCK, &local_err);
if (error_is_set(&local_err)) {
error_propagate(errp, local_err);
}
return ret;
}
static int cdrom_probe_device(const char *filename)
@ -1757,6 +1809,7 @@ static BlockDriver bdrv_host_cdrom = {
.format_name = "host_cdrom",
.protocol_name = "host_cdrom",
.instance_size = sizeof(BDRVRawState),
.bdrv_needs_filename = true,
.bdrv_probe_device = cdrom_probe_device,
.bdrv_file_open = cdrom_open,
.bdrv_close = raw_close,
@ -1771,7 +1824,8 @@ static BlockDriver bdrv_host_cdrom = {
.bdrv_aio_flush = raw_aio_flush,
.bdrv_truncate = raw_truncate,
.bdrv_getlength = raw_getlength,
.bdrv_getlength = raw_getlength,
.has_variable_length = true,
.bdrv_get_allocated_file_size
= raw_get_allocated_file_size,
@ -1787,16 +1841,22 @@ static BlockDriver bdrv_host_cdrom = {
#endif /* __linux__ */
#if defined (__FreeBSD__) || defined(__FreeBSD_kernel__)
static int cdrom_open(BlockDriverState *bs, QDict *options, int flags)
static int cdrom_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVRawState *s = bs->opaque;
Error *local_err = NULL;
int ret;
s->type = FTYPE_CD;
ret = raw_open_common(bs, options, flags, 0);
if (ret)
ret = raw_open_common(bs, options, flags, 0, &local_err);
if (ret) {
if (error_is_set(&local_err)) {
error_propagate(errp, local_err);
}
return ret;
}
/* make sure the door isn't locked at this time */
ioctl(s->fd, CDIOCALLOW);
@ -1878,6 +1938,7 @@ static BlockDriver bdrv_host_cdrom = {
.format_name = "host_cdrom",
.protocol_name = "host_cdrom",
.instance_size = sizeof(BDRVRawState),
.bdrv_needs_filename = true,
.bdrv_probe_device = cdrom_probe_device,
.bdrv_file_open = cdrom_open,
.bdrv_close = raw_close,
@ -1892,7 +1953,8 @@ static BlockDriver bdrv_host_cdrom = {
.bdrv_aio_flush = raw_aio_flush,
.bdrv_truncate = raw_truncate,
.bdrv_getlength = raw_getlength,
.bdrv_getlength = raw_getlength,
.has_variable_length = true,
.bdrv_get_allocated_file_size
= raw_get_allocated_file_size,

View File

@ -85,6 +85,7 @@ static size_t handle_aiocb_rw(RawWin32AIOData *aiocb)
ret_count = 0;
}
if (ret_count != len) {
offset += ret_count;
break;
}
offset += len;
@ -234,7 +235,8 @@ static QemuOptsList raw_runtime_opts = {
},
};
static int raw_open(BlockDriverState *bs, QDict *options, int flags)
static int raw_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVRawState *s = bs->opaque;
int access_flags;
@ -249,8 +251,7 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags)
opts = qemu_opts_create_nofail(&raw_runtime_opts);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (error_is_set(&local_err)) {
qerror_report_err(local_err);
error_free(local_err);
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
@ -262,6 +263,7 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags)
if ((flags & BDRV_O_NATIVE_AIO) && aio == NULL) {
aio = win32_aio_init();
if (aio == NULL) {
error_setg(errp, "Could not initialize AIO");
ret = -EINVAL;
goto fail;
}
@ -285,6 +287,7 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags)
ret = win32_aio_attach(aio, s->hfile);
if (ret < 0) {
CloseHandle(s->hfile);
error_setg_errno(errp, -ret, "Could not enable AIO");
goto fail;
}
s->aio = aio;
@ -420,7 +423,8 @@ static int64_t raw_get_allocated_file_size(BlockDriverState *bs)
return st.st_size;
}
static int raw_create(const char *filename, QEMUOptionParameter *options)
static int raw_create(const char *filename, QEMUOptionParameter *options,
Error **errp)
{
int fd;
int64_t total_size = 0;
@ -435,8 +439,10 @@ static int raw_create(const char *filename, QEMUOptionParameter *options)
fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
0644);
if (fd < 0)
if (fd < 0) {
error_setg_errno(errp, errno, "Could not create file");
return -EIO;
}
set_sparse(fd);
ftruncate(fd, total_size * 512);
qemu_close(fd);
@ -456,6 +462,7 @@ static BlockDriver bdrv_file = {
.format_name = "file",
.protocol_name = "file",
.instance_size = sizeof(BDRVRawState),
.bdrv_needs_filename = true,
.bdrv_file_open = raw_open,
.bdrv_close = raw_close,
.bdrv_create = raw_create,
@ -531,17 +538,34 @@ static int hdev_probe_device(const char *filename)
return 0;
}
static int hdev_open(BlockDriverState *bs, QDict *options, int flags)
static int hdev_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVRawState *s = bs->opaque;
int access_flags, create_flags;
int ret = 0;
DWORD overlapped;
char device_name[64];
const char *filename = qdict_get_str(options, "filename");
Error *local_err = NULL;
const char *filename;
QemuOpts *opts = qemu_opts_create_nofail(&raw_runtime_opts);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (error_is_set(&local_err)) {
error_propagate(errp, local_err);
ret = -EINVAL;
goto done;
}
filename = qemu_opt_get(opts, "filename");
if (strstart(filename, "/dev/cdrom", NULL)) {
if (find_cdrom(device_name, sizeof(device_name)) < 0)
return -ENOENT;
if (find_cdrom(device_name, sizeof(device_name)) < 0) {
error_setg(errp, "Could not open CD-ROM drive");
ret = -ENOENT;
goto done;
}
filename = device_name;
} else {
/* transform drive letters into device name */
@ -564,17 +588,25 @@ static int hdev_open(BlockDriverState *bs, QDict *options, int flags)
if (s->hfile == INVALID_HANDLE_VALUE) {
int err = GetLastError();
if (err == ERROR_ACCESS_DENIED)
return -EACCES;
return -1;
if (err == ERROR_ACCESS_DENIED) {
ret = -EACCES;
} else {
ret = -EINVAL;
}
error_setg_errno(errp, -ret, "Could not open device");
goto done;
}
return 0;
done:
qemu_opts_del(opts);
return ret;
}
static BlockDriver bdrv_host_device = {
.format_name = "host_device",
.protocol_name = "host_device",
.instance_size = sizeof(BDRVRawState),
.bdrv_needs_filename = true,
.bdrv_probe_device = hdev_probe_device,
.bdrv_file_open = hdev_open,
.bdrv_close = raw_close,
@ -583,7 +615,9 @@ static BlockDriver bdrv_host_device = {
.bdrv_aio_writev = raw_aio_writev,
.bdrv_aio_flush = raw_aio_flush,
.bdrv_getlength = raw_getlength,
.bdrv_getlength = raw_getlength,
.has_variable_length = true,
.bdrv_get_allocated_file_size
= raw_get_allocated_file_size,
};

View File

@ -1,13 +1,17 @@
/*
* Block driver for RAW format
/* BlockDriver implementation for "raw"
*
* Copyright (c) 2006 Fabrice Bellard
* Copyright (C) 2010, 2013, Red Hat, Inc.
* Copyright (C) 2010, Blue Swirl <blauwirbel@gmail.com>
* Copyright (C) 2009, Anthony Liguori <aliguori@us.ibm.com>
*
* Author:
* Laszlo Ersek <lersek@redhat.com>
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* of this software and associated documentation files (the "Software"), to
* deal in the Software without restriction, including without limitation the
* rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
* sell copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
@ -15,27 +19,27 @@
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
#include "qemu-common.h"
#include "block/block_int.h"
#include "qemu/module.h"
#include "qemu/option.h"
static int raw_open(BlockDriverState *bs, QDict *options, int flags)
{
bs->sg = bs->file->sg;
return 0;
}
static QEMUOptionParameter raw_create_options[] = {
{
.name = BLOCK_OPT_SIZE,
.type = OPT_SIZE,
.help = "Virtual disk size"
},
{ 0 }
};
/* We have nothing to do for raw reopen, stubs just return
* success */
static int raw_reopen_prepare(BDRVReopenState *state,
BlockReopenQueue *queue, Error **errp)
static int raw_reopen_prepare(BDRVReopenState *reopen_state,
BlockReopenQueue *queue, Error **errp)
{
return 0;
}
@ -54,45 +58,42 @@ static int coroutine_fn raw_co_writev(BlockDriverState *bs, int64_t sector_num,
return bdrv_co_writev(bs->file, sector_num, nb_sectors, qiov);
}
static void raw_close(BlockDriverState *bs)
{
}
static int coroutine_fn raw_co_is_allocated(BlockDriverState *bs,
static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
int64_t sector_num,
int nb_sectors, int *pnum)
{
return bdrv_co_is_allocated(bs->file, sector_num, nb_sectors, pnum);
*pnum = nb_sectors;
return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_DATA |
(sector_num << BDRV_SECTOR_BITS);
}
static int coroutine_fn raw_co_write_zeroes(BlockDriverState *bs,
int64_t sector_num,
int nb_sectors)
int64_t sector_num, int nb_sectors)
{
return bdrv_co_write_zeroes(bs->file, sector_num, nb_sectors);
}
static int coroutine_fn raw_co_discard(BlockDriverState *bs,
int64_t sector_num, int nb_sectors)
{
return bdrv_co_discard(bs->file, sector_num, nb_sectors);
}
static int64_t raw_getlength(BlockDriverState *bs)
{
return bdrv_getlength(bs->file);
}
static int raw_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
{
return bdrv_get_info(bs->file, bdi);
}
static int raw_truncate(BlockDriverState *bs, int64_t offset)
{
return bdrv_truncate(bs->file, offset);
}
static int raw_probe(const uint8_t *buf, int buf_size, const char *filename)
{
return 1; /* everything can be opened as raw image */
}
static int coroutine_fn raw_co_discard(BlockDriverState *bs,
int64_t sector_num, int nb_sectors)
{
return bdrv_co_discard(bs->file, sector_num, nb_sectors);
}
static int raw_is_inserted(BlockDriverState *bs)
{
return bdrv_is_inserted(bs->file);
@ -115,73 +116,78 @@ static void raw_lock_medium(BlockDriverState *bs, bool locked)
static int raw_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
{
return bdrv_ioctl(bs->file, req, buf);
return bdrv_ioctl(bs->file, req, buf);
}
static BlockDriverAIOCB *raw_aio_ioctl(BlockDriverState *bs,
unsigned long int req, void *buf,
BlockDriverCompletionFunc *cb, void *opaque)
unsigned long int req, void *buf,
BlockDriverCompletionFunc *cb,
void *opaque)
{
return bdrv_aio_ioctl(bs->file, req, buf, cb, opaque);
return bdrv_aio_ioctl(bs->file, req, buf, cb, opaque);
}
static int raw_create(const char *filename, QEMUOptionParameter *options)
{
return bdrv_create_file(filename, options);
}
static QEMUOptionParameter raw_create_options[] = {
{
.name = BLOCK_OPT_SIZE,
.type = OPT_SIZE,
.help = "Virtual disk size"
},
{ NULL }
};
static int raw_has_zero_init(BlockDriverState *bs)
{
return bdrv_has_zero_init(bs->file);
}
static int raw_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
static int raw_create(const char *filename, QEMUOptionParameter *options,
Error **errp)
{
return bdrv_get_info(bs->file, bdi);
Error *local_err = NULL;
int ret;
ret = bdrv_create_file(filename, options, &local_err);
if (error_is_set(&local_err)) {
error_propagate(errp, local_err);
}
return ret;
}
static int raw_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
bs->sg = bs->file->sg;
return 0;
}
static void raw_close(BlockDriverState *bs)
{
}
static int raw_probe(const uint8_t *buf, int buf_size, const char *filename)
{
/* smallest possible positive score so that raw is used if and only if no
* other block driver works
*/
return 1;
}
static BlockDriver bdrv_raw = {
.format_name = "raw",
/* It's really 0, but we need to make g_malloc() happy */
.instance_size = 1,
.bdrv_open = raw_open,
.bdrv_close = raw_close,
.bdrv_reopen_prepare = raw_reopen_prepare,
.bdrv_co_readv = raw_co_readv,
.bdrv_co_writev = raw_co_writev,
.bdrv_co_is_allocated = raw_co_is_allocated,
.bdrv_co_write_zeroes = raw_co_write_zeroes,
.bdrv_co_discard = raw_co_discard,
.bdrv_probe = raw_probe,
.bdrv_getlength = raw_getlength,
.bdrv_get_info = raw_get_info,
.bdrv_truncate = raw_truncate,
.bdrv_is_inserted = raw_is_inserted,
.bdrv_media_changed = raw_media_changed,
.bdrv_eject = raw_eject,
.bdrv_lock_medium = raw_lock_medium,
.bdrv_ioctl = raw_ioctl,
.bdrv_aio_ioctl = raw_aio_ioctl,
.bdrv_create = raw_create,
.create_options = raw_create_options,
.bdrv_has_zero_init = raw_has_zero_init,
.format_name = "raw",
.bdrv_probe = &raw_probe,
.bdrv_reopen_prepare = &raw_reopen_prepare,
.bdrv_open = &raw_open,
.bdrv_close = &raw_close,
.bdrv_create = &raw_create,
.bdrv_co_readv = &raw_co_readv,
.bdrv_co_writev = &raw_co_writev,
.bdrv_co_write_zeroes = &raw_co_write_zeroes,
.bdrv_co_discard = &raw_co_discard,
.bdrv_co_get_block_status = &raw_co_get_block_status,
.bdrv_truncate = &raw_truncate,
.bdrv_getlength = &raw_getlength,
.has_variable_length = true,
.bdrv_get_info = &raw_get_info,
.bdrv_is_inserted = &raw_is_inserted,
.bdrv_media_changed = &raw_media_changed,
.bdrv_eject = &raw_eject,
.bdrv_lock_medium = &raw_lock_medium,
.bdrv_ioctl = &raw_ioctl,
.bdrv_aio_ioctl = &raw_aio_ioctl,
.create_options = &raw_create_options[0],
.bdrv_has_zero_init = &raw_has_zero_init
};
static void bdrv_raw_init(void)

View File

@ -100,7 +100,6 @@ typedef struct BDRVRBDState {
rados_ioctx_t io_ctx;
rbd_image_t image;
char name[RBD_MAX_IMAGE_NAME_SIZE];
int qemu_aio_count;
char *snap;
int event_reader_pos;
RADOSCB *event_rcb;
@ -288,7 +287,8 @@ static int qemu_rbd_set_conf(rados_t cluster, const char *conf)
return ret;
}
static int qemu_rbd_create(const char *filename, QEMUOptionParameter *options)
static int qemu_rbd_create(const char *filename, QEMUOptionParameter *options,
Error **errp)
{
int64_t bytes = 0;
int64_t objsize;
@ -428,19 +428,11 @@ static void qemu_rbd_aio_event_reader(void *opaque)
if (s->event_reader_pos == sizeof(s->event_rcb)) {
s->event_reader_pos = 0;
qemu_rbd_complete_aio(s->event_rcb);
s->qemu_aio_count--;
}
}
} while (ret < 0 && errno == EINTR);
}
static int qemu_rbd_aio_flush_cb(void *opaque)
{
BDRVRBDState *s = opaque;
return (s->qemu_aio_count > 0);
}
/* TODO Convert to fine grained options */
static QemuOptsList runtime_opts = {
.name = "rbd",
@ -455,7 +447,8 @@ static QemuOptsList runtime_opts = {
},
};
static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags)
static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVRBDState *s = bs->opaque;
char pool[RBD_MAX_POOL_NAME_SIZE];
@ -554,7 +547,7 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags)
fcntl(s->fds[0], F_SETFL, O_NONBLOCK);
fcntl(s->fds[1], F_SETFL, O_NONBLOCK);
qemu_aio_set_fd_handler(s->fds[RBD_FD_READ], qemu_rbd_aio_event_reader,
NULL, qemu_rbd_aio_flush_cb, s);
NULL, s);
qemu_opts_del(opts);
@ -578,7 +571,7 @@ static void qemu_rbd_close(BlockDriverState *bs)
close(s->fds[0]);
close(s->fds[1]);
qemu_aio_set_fd_handler(s->fds[RBD_FD_READ], NULL, NULL, NULL, NULL);
qemu_aio_set_fd_handler(s->fds[RBD_FD_READ], NULL, NULL, NULL);
rbd_close(s->image);
rados_ioctx_destroy(s->io_ctx);
@ -741,8 +734,6 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
off = sector_num * BDRV_SECTOR_SIZE;
size = nb_sectors * BDRV_SECTOR_SIZE;
s->qemu_aio_count++; /* All the RADOSCB */
rcb = g_malloc(sizeof(RADOSCB));
rcb->done = 0;
rcb->acb = acb;
@ -779,7 +770,6 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
failed:
g_free(rcb);
s->qemu_aio_count--;
qemu_aio_release(acb);
return NULL;
}
@ -903,12 +893,31 @@ static int qemu_rbd_snap_create(BlockDriverState *bs,
}
static int qemu_rbd_snap_remove(BlockDriverState *bs,
const char *snapshot_name)
const char *snapshot_id,
const char *snapshot_name,
Error **errp)
{
BDRVRBDState *s = bs->opaque;
int r;
if (!snapshot_name) {
error_setg(errp, "rbd need a valid snapshot name");
return -EINVAL;
}
/* If snapshot_id is specified, it must be equal to name, see
qemu_rbd_snap_list() */
if (snapshot_id && strcmp(snapshot_id, snapshot_name)) {
error_setg(errp,
"rbd do not support snapshot id, it should be NULL or "
"equal to snapshot name");
return -EINVAL;
}
r = rbd_snap_remove(s->image, snapshot_name);
if (r < 0) {
error_setg_errno(errp, -r, "Failed to remove the snapshot");
}
return r;
}
@ -934,7 +943,7 @@ static int qemu_rbd_snap_list(BlockDriverState *bs,
do {
snaps = g_malloc(sizeof(*snaps) * max_snaps);
snap_count = rbd_snap_list(s->image, snaps, &max_snaps);
if (snap_count < 0) {
if (snap_count <= 0) {
g_free(snaps);
}
} while (snap_count == -ERANGE);
@ -958,6 +967,7 @@ static int qemu_rbd_snap_list(BlockDriverState *bs,
sn_info->vm_clock_nsec = 0;
}
rbd_snap_list_end(snaps);
g_free(snaps);
done:
*psn_tab = sn_tab;
@ -993,6 +1003,7 @@ static QEMUOptionParameter qemu_rbd_create_options[] = {
static BlockDriver bdrv_rbd = {
.format_name = "rbd",
.instance_size = sizeof(BDRVRBDState),
.bdrv_needs_filename = true,
.bdrv_file_open = qemu_rbd_open,
.bdrv_close = qemu_rbd_close,
.bdrv_create = qemu_rbd_create,

View File

@ -125,8 +125,9 @@ typedef struct SheepdogObjReq {
uint32_t data_length;
uint64_t oid;
uint64_t cow_oid;
uint32_t copies;
uint32_t rsvd;
uint8_t copies;
uint8_t copy_policy;
uint8_t reserved[6];
uint64_t offset;
} SheepdogObjReq;
@ -138,7 +139,9 @@ typedef struct SheepdogObjRsp {
uint32_t id;
uint32_t data_length;
uint32_t result;
uint32_t copies;
uint8_t copies;
uint8_t copy_policy;
uint8_t reserved[2];
uint32_t pad[6];
} SheepdogObjRsp;
@ -151,7 +154,9 @@ typedef struct SheepdogVdiReq {
uint32_t data_length;
uint64_t vdi_size;
uint32_t vdi_id;
uint32_t copies;
uint8_t copies;
uint8_t copy_policy;
uint8_t reserved[2];
uint32_t snapid;
uint32_t pad[3];
} SheepdogVdiReq;
@ -222,6 +227,11 @@ static inline uint64_t data_oid_to_idx(uint64_t oid)
return oid & (MAX_DATA_OBJS - 1);
}
static inline uint32_t oid_to_vid(uint64_t oid)
{
return (oid & ~VDI_BIT) >> VDI_SPACE_SHIFT;
}
static inline uint64_t vid_to_vdi_oid(uint32_t vid)
{
return VDI_BIT | ((uint64_t)vid << VDI_SPACE_SHIFT);
@ -289,11 +299,14 @@ struct SheepdogAIOCB {
Coroutine *coroutine;
void (*aio_done_func)(SheepdogAIOCB *);
bool canceled;
bool cancelable;
bool *finished;
int nr_pending;
};
typedef struct BDRVSheepdogState {
BlockDriverState *bs;
SheepdogInode inode;
uint32_t min_dirty_data_idx;
@ -313,8 +326,11 @@ typedef struct BDRVSheepdogState {
Coroutine *co_recv;
uint32_t aioreq_seq_num;
/* Every aio request must be linked to either of these queues. */
QLIST_HEAD(inflight_aio_head, AIOReq) inflight_aio_head;
QLIST_HEAD(pending_aio_head, AIOReq) pending_aio_head;
QLIST_HEAD(failed_aio_head, AIOReq) failed_aio_head;
} BDRVSheepdogState;
static const char * sd_strerror(int err)
@ -403,6 +419,7 @@ static inline void free_aio_req(BDRVSheepdogState *s, AIOReq *aio_req)
{
SheepdogAIOCB *acb = aio_req->aiocb;
acb->cancelable = false;
QLIST_REMOVE(aio_req, aio_siblings);
g_free(aio_req);
@ -411,23 +428,68 @@ static inline void free_aio_req(BDRVSheepdogState *s, AIOReq *aio_req)
static void coroutine_fn sd_finish_aiocb(SheepdogAIOCB *acb)
{
if (!acb->canceled) {
qemu_coroutine_enter(acb->coroutine, NULL);
qemu_coroutine_enter(acb->coroutine, NULL);
if (acb->finished) {
*acb->finished = true;
}
qemu_aio_release(acb);
}
/*
* Check whether the specified acb can be canceled
*
* We can cancel aio when any request belonging to the acb is:
* - Not processed by the sheepdog server.
* - Not linked to the inflight queue.
*/
static bool sd_acb_cancelable(const SheepdogAIOCB *acb)
{
BDRVSheepdogState *s = acb->common.bs->opaque;
AIOReq *aioreq;
if (!acb->cancelable) {
return false;
}
QLIST_FOREACH(aioreq, &s->inflight_aio_head, aio_siblings) {
if (aioreq->aiocb == acb) {
return false;
}
}
return true;
}
static void sd_aio_cancel(BlockDriverAIOCB *blockacb)
{
SheepdogAIOCB *acb = (SheepdogAIOCB *)blockacb;
BDRVSheepdogState *s = acb->common.bs->opaque;
AIOReq *aioreq, *next;
bool finished = false;
/*
* Sheepdog cannot cancel the requests which are already sent to
* the servers, so we just complete the request with -EIO here.
*/
acb->ret = -EIO;
qemu_coroutine_enter(acb->coroutine, NULL);
acb->canceled = true;
acb->finished = &finished;
while (!finished) {
if (sd_acb_cancelable(acb)) {
/* Remove outstanding requests from pending and failed queues. */
QLIST_FOREACH_SAFE(aioreq, &s->pending_aio_head, aio_siblings,
next) {
if (aioreq->aiocb == acb) {
free_aio_req(s, aioreq);
}
}
QLIST_FOREACH_SAFE(aioreq, &s->failed_aio_head, aio_siblings,
next) {
if (aioreq->aiocb == acb) {
free_aio_req(s, aioreq);
}
}
assert(acb->nr_pending == 0);
sd_finish_aiocb(acb);
return;
}
qemu_aio_wait();
}
}
static const AIOCBInfo sd_aiocb_info = {
@ -448,7 +510,8 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, QEMUIOVector *qiov,
acb->nb_sectors = nb_sectors;
acb->aio_done_func = NULL;
acb->canceled = false;
acb->cancelable = true;
acb->finished = NULL;
acb->coroutine = qemu_coroutine_self();
acb->ret = 0;
acb->nr_pending = 0;
@ -489,13 +552,13 @@ static coroutine_fn int send_co_req(int sockfd, SheepdogReq *hdr, void *data,
int ret;
ret = qemu_co_send(sockfd, hdr, sizeof(*hdr));
if (ret < sizeof(*hdr)) {
if (ret != sizeof(*hdr)) {
error_report("failed to send a req, %s", strerror(errno));
return ret;
}
ret = qemu_co_send(sockfd, data, *wlen);
if (ret < *wlen) {
if (ret != *wlen) {
error_report("failed to send a req, %s", strerror(errno));
}
@ -509,13 +572,6 @@ static void restart_co_req(void *opaque)
qemu_coroutine_enter(co, NULL);
}
static int have_co_req(void *opaque)
{
/* this handler is set only when there is a pending request, so
* always returns 1. */
return 1;
}
typedef struct SheepdogReqCo {
int sockfd;
SheepdogReq *hdr;
@ -538,17 +594,17 @@ static coroutine_fn void do_co_req(void *opaque)
unsigned int *rlen = srco->rlen;
co = qemu_coroutine_self();
qemu_aio_set_fd_handler(sockfd, NULL, restart_co_req, have_co_req, co);
qemu_aio_set_fd_handler(sockfd, NULL, restart_co_req, co);
ret = send_co_req(sockfd, hdr, data, wlen);
if (ret < 0) {
goto out;
}
qemu_aio_set_fd_handler(sockfd, restart_co_req, NULL, have_co_req, co);
qemu_aio_set_fd_handler(sockfd, restart_co_req, NULL, co);
ret = qemu_co_recv(sockfd, hdr, sizeof(*hdr));
if (ret < sizeof(*hdr)) {
if (ret != sizeof(*hdr)) {
error_report("failed to get a rsp, %s", strerror(errno));
ret = -errno;
goto out;
@ -560,7 +616,7 @@ static coroutine_fn void do_co_req(void *opaque)
if (*rlen) {
ret = qemu_co_recv(sockfd, data, *rlen);
if (ret < *rlen) {
if (ret != *rlen) {
error_report("failed to get the data, %s", strerror(errno));
ret = -errno;
goto out;
@ -570,7 +626,7 @@ static coroutine_fn void do_co_req(void *opaque)
out:
/* there is at most one request for this sockfd, so it is safe to
* set each handler to NULL. */
qemu_aio_set_fd_handler(sockfd, NULL, NULL, NULL, NULL);
qemu_aio_set_fd_handler(sockfd, NULL, NULL, NULL);
srco->ret = ret;
srco->finished = true;
@ -603,11 +659,13 @@ static int do_req(int sockfd, SheepdogReq *hdr, void *data,
return srco.ret;
}
static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type);
static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char *tag);
static int get_sheep_fd(BDRVSheepdogState *s);
static void co_write_request(void *opaque);
static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid)
{
@ -630,22 +688,59 @@ static void coroutine_fn send_pending_req(BDRVSheepdogState *s, uint64_t oid)
{
AIOReq *aio_req;
SheepdogAIOCB *acb;
int ret;
while ((aio_req = find_pending_req(s, oid)) != NULL) {
acb = aio_req->aiocb;
/* move aio_req from pending list to inflight one */
QLIST_REMOVE(aio_req, aio_siblings);
QLIST_INSERT_HEAD(&s->inflight_aio_head, aio_req, aio_siblings);
ret = add_aio_request(s, aio_req, acb->qiov->iov,
acb->qiov->niov, false, acb->aiocb_type);
if (ret < 0) {
error_report("add_aio_request is failed");
free_aio_req(s, aio_req);
if (!acb->nr_pending) {
sd_finish_aiocb(acb);
}
add_aio_request(s, aio_req, acb->qiov->iov, acb->qiov->niov, false,
acb->aiocb_type);
}
}
static coroutine_fn void reconnect_to_sdog(void *opaque)
{
BDRVSheepdogState *s = opaque;
AIOReq *aio_req, *next;
qemu_aio_set_fd_handler(s->fd, NULL, NULL, NULL);
close(s->fd);
s->fd = -1;
/* Wait for outstanding write requests to be completed. */
while (s->co_send != NULL) {
co_write_request(opaque);
}
/* Try to reconnect the sheepdog server every one second. */
while (s->fd < 0) {
s->fd = get_sheep_fd(s);
if (s->fd < 0) {
DPRINTF("Wait for connection to be established\n");
co_aio_sleep_ns(bdrv_get_aio_context(s->bs), QEMU_CLOCK_REALTIME,
1000000000ULL);
}
};
/*
* Now we have to resend all the request in the inflight queue. However,
* resend_aioreq() can yield and newly created requests can be added to the
* inflight queue before the coroutine is resumed. To avoid mixing them, we
* have to move all the inflight requests to the failed queue before
* resend_aioreq() is called.
*/
QLIST_FOREACH_SAFE(aio_req, &s->inflight_aio_head, aio_siblings, next) {
QLIST_REMOVE(aio_req, aio_siblings);
QLIST_INSERT_HEAD(&s->failed_aio_head, aio_req, aio_siblings);
}
/* Resend all the failed aio requests. */
while (!QLIST_EMPTY(&s->failed_aio_head)) {
aio_req = QLIST_FIRST(&s->failed_aio_head);
QLIST_REMOVE(aio_req, aio_siblings);
QLIST_INSERT_HEAD(&s->inflight_aio_head, aio_req, aio_siblings);
resend_aioreq(s, aio_req);
}
}
@ -665,15 +760,11 @@ static void coroutine_fn aio_read_response(void *opaque)
SheepdogAIOCB *acb;
uint64_t idx;
if (QLIST_EMPTY(&s->inflight_aio_head)) {
goto out;
}
/* read a header */
ret = qemu_co_recv(fd, &rsp, sizeof(rsp));
if (ret < 0) {
if (ret != sizeof(rsp)) {
error_report("failed to get the header, %s", strerror(errno));
goto out;
goto err;
}
/* find the right aio_req from the inflight aio list */
@ -684,7 +775,7 @@ static void coroutine_fn aio_read_response(void *opaque)
}
if (!aio_req) {
error_report("cannot find aio_req %x", rsp.id);
goto out;
goto err;
}
acb = aio_req->aiocb;
@ -722,9 +813,9 @@ static void coroutine_fn aio_read_response(void *opaque)
case AIOCB_READ_UDATA:
ret = qemu_co_recvv(fd, acb->qiov->iov, acb->qiov->niov,
aio_req->iov_offset, rsp.data_length);
if (ret < 0) {
if (ret != rsp.data_length) {
error_report("failed to get the data, %s", strerror(errno));
goto out;
goto err;
}
break;
case AIOCB_FLUSH_CACHE:
@ -755,11 +846,20 @@ static void coroutine_fn aio_read_response(void *opaque)
case SD_RES_SUCCESS:
break;
case SD_RES_READONLY:
ret = resend_aioreq(s, aio_req);
if (ret == SD_RES_SUCCESS) {
goto out;
if (s->inode.vdi_id == oid_to_vid(aio_req->oid)) {
ret = reload_inode(s, 0, "");
if (ret < 0) {
goto err;
}
}
/* fall through */
if (is_data_obj(aio_req->oid)) {
aio_req->oid = vid_to_data_oid(s->inode.vdi_id,
data_oid_to_idx(aio_req->oid));
} else {
aio_req->oid = vid_to_vdi_oid(s->inode.vdi_id);
}
resend_aioreq(s, aio_req);
goto out;
default:
acb->ret = -EIO;
error_report("%s", sd_strerror(rsp.result));
@ -776,6 +876,10 @@ static void coroutine_fn aio_read_response(void *opaque)
}
out:
s->co_recv = NULL;
return;
err:
s->co_recv = NULL;
reconnect_to_sdog(opaque);
}
static void co_read_response(void *opaque)
@ -796,14 +900,6 @@ static void co_write_request(void *opaque)
qemu_coroutine_enter(s->co_send, NULL);
}
static int aio_flush_request(void *opaque)
{
BDRVSheepdogState *s = opaque;
return !QLIST_EMPTY(&s->inflight_aio_head) ||
!QLIST_EMPTY(&s->pending_aio_head);
}
/*
* Return a socket discriptor to read/write objects.
*
@ -819,7 +915,7 @@ static int get_sheep_fd(BDRVSheepdogState *s)
return fd;
}
qemu_aio_set_fd_handler(fd, co_read_response, NULL, aio_flush_request, s);
qemu_aio_set_fd_handler(fd, co_read_response, NULL, s);
return fd;
}
@ -1012,7 +1108,7 @@ out:
return ret;
}
static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type)
{
@ -1069,36 +1165,30 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
qemu_co_mutex_lock(&s->lock);
s->co_send = qemu_coroutine_self();
qemu_aio_set_fd_handler(s->fd, co_read_response, co_write_request,
aio_flush_request, s);
qemu_aio_set_fd_handler(s->fd, co_read_response, co_write_request, s);
socket_set_cork(s->fd, 1);
/* send a header */
ret = qemu_co_send(s->fd, &hdr, sizeof(hdr));
if (ret < 0) {
qemu_co_mutex_unlock(&s->lock);
if (ret != sizeof(hdr)) {
error_report("failed to send a req, %s", strerror(errno));
return -errno;
goto out;
}
if (wlen) {
ret = qemu_co_sendv(s->fd, iov, niov, aio_req->iov_offset, wlen);
if (ret < 0) {
qemu_co_mutex_unlock(&s->lock);
if (ret != wlen) {
error_report("failed to send a data, %s", strerror(errno));
return -errno;
}
}
out:
socket_set_cork(s->fd, 0);
qemu_aio_set_fd_handler(s->fd, co_read_response, NULL,
aio_flush_request, s);
qemu_aio_set_fd_handler(s->fd, co_read_response, NULL, s);
s->co_send = NULL;
qemu_co_mutex_unlock(&s->lock);
return 0;
}
static int read_write_object(int fd, char *buf, uint64_t oid, int copies,
static int read_write_object(int fd, char *buf, uint64_t oid, uint8_t copies,
unsigned int datalen, uint64_t offset,
bool write, bool create, uint32_t cache_flags)
{
@ -1146,7 +1236,7 @@ static int read_write_object(int fd, char *buf, uint64_t oid, int copies,
}
}
static int read_object(int fd, char *buf, uint64_t oid, int copies,
static int read_object(int fd, char *buf, uint64_t oid, uint8_t copies,
unsigned int datalen, uint64_t offset,
uint32_t cache_flags)
{
@ -1154,7 +1244,7 @@ static int read_object(int fd, char *buf, uint64_t oid, int copies,
false, cache_flags);
}
static int write_object(int fd, char *buf, uint64_t oid, int copies,
static int write_object(int fd, char *buf, uint64_t oid, uint8_t copies,
unsigned int datalen, uint64_t offset, bool create,
uint32_t cache_flags)
{
@ -1198,51 +1288,62 @@ out:
return ret;
}
static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
/* Return true if the specified request is linked to the pending list. */
static bool check_simultaneous_create(BDRVSheepdogState *s, AIOReq *aio_req)
{
AIOReq *areq;
QLIST_FOREACH(areq, &s->inflight_aio_head, aio_siblings) {
if (areq != aio_req && areq->oid == aio_req->oid) {
/*
* Sheepdog cannot handle simultaneous create requests to the same
* object, so we cannot send the request until the previous request
* finishes.
*/
DPRINTF("simultaneous create to %" PRIx64 "\n", aio_req->oid);
aio_req->flags = 0;
aio_req->base_oid = 0;
QLIST_REMOVE(aio_req, aio_siblings);
QLIST_INSERT_HEAD(&s->pending_aio_head, aio_req, aio_siblings);
return true;
}
}
return false;
}
static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
{
SheepdogAIOCB *acb = aio_req->aiocb;
bool create = false;
int ret;
ret = reload_inode(s, 0, "");
if (ret < 0) {
return ret;
}
aio_req->oid = vid_to_data_oid(s->inode.vdi_id,
data_oid_to_idx(aio_req->oid));
/* check whether this request becomes a CoW one */
if (acb->aiocb_type == AIOCB_WRITE_UDATA) {
if (acb->aiocb_type == AIOCB_WRITE_UDATA && is_data_obj(aio_req->oid)) {
int idx = data_oid_to_idx(aio_req->oid);
AIOReq *areq;
if (s->inode.data_vdi_id[idx] == 0) {
create = true;
goto out;
}
if (is_data_obj_writable(&s->inode, idx)) {
goto out;
}
/* link to the pending list if there is another CoW request to
* the same object */
QLIST_FOREACH(areq, &s->inflight_aio_head, aio_siblings) {
if (areq != aio_req && areq->oid == aio_req->oid) {
DPRINTF("simultaneous CoW to %" PRIx64 "\n", aio_req->oid);
QLIST_REMOVE(aio_req, aio_siblings);
QLIST_INSERT_HEAD(&s->pending_aio_head, aio_req, aio_siblings);
return SD_RES_SUCCESS;
}
if (check_simultaneous_create(s, aio_req)) {
return;
}
aio_req->base_oid = vid_to_data_oid(s->inode.data_vdi_id[idx], idx);
aio_req->flags |= SD_FLAG_CMD_COW;
if (s->inode.data_vdi_id[idx]) {
aio_req->base_oid = vid_to_data_oid(s->inode.data_vdi_id[idx], idx);
aio_req->flags |= SD_FLAG_CMD_COW;
}
create = true;
}
out:
return add_aio_request(s, aio_req, acb->qiov->iov, acb->qiov->niov,
create, acb->aiocb_type);
if (is_data_obj(aio_req->oid)) {
add_aio_request(s, aio_req, acb->qiov->iov, acb->qiov->niov, create,
acb->aiocb_type);
} else {
struct iovec iov;
iov.iov_base = &s->inode;
iov.iov_len = sizeof(s->inode);
add_aio_request(s, aio_req, &iov, 1, false, AIOCB_WRITE_UDATA);
}
}
/* TODO Convert to fine grained options */
@ -1259,7 +1360,8 @@ static QemuOptsList runtime_opts = {
},
};
static int sd_open(BlockDriverState *bs, QDict *options, int flags)
static int sd_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
int ret, fd;
uint32_t vid = 0;
@ -1271,6 +1373,8 @@ static int sd_open(BlockDriverState *bs, QDict *options, int flags)
Error *local_err = NULL;
const char *filename;
s->bs = bs;
opts = qemu_opts_create_nofail(&runtime_opts);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (error_is_set(&local_err)) {
@ -1284,6 +1388,7 @@ static int sd_open(BlockDriverState *bs, QDict *options, int flags)
QLIST_INIT(&s->inflight_aio_head);
QLIST_INIT(&s->pending_aio_head);
QLIST_INIT(&s->failed_aio_head);
s->fd = -1;
memset(vdi, 0, sizeof(vdi));
@ -1350,7 +1455,7 @@ static int sd_open(BlockDriverState *bs, QDict *options, int flags)
g_free(buf);
return 0;
out:
qemu_aio_set_fd_handler(s->fd, NULL, NULL, NULL, NULL);
qemu_aio_set_fd_handler(s->fd, NULL, NULL, NULL);
if (s->fd >= 0) {
closesocket(s->fd);
}
@ -1360,7 +1465,8 @@ out:
}
static int do_sd_create(BDRVSheepdogState *s, char *filename, int64_t vdi_size,
uint32_t base_vid, uint32_t *vdi_id, int snapshot)
uint32_t base_vid, uint32_t *vdi_id, int snapshot,
uint8_t copy_policy)
{
SheepdogVdiReq hdr;
SheepdogVdiRsp *rsp = (SheepdogVdiRsp *)&hdr;
@ -1390,6 +1496,7 @@ static int do_sd_create(BDRVSheepdogState *s, char *filename, int64_t vdi_size,
hdr.data_length = wlen;
hdr.vdi_size = vdi_size;
hdr.copy_policy = copy_policy;
ret = do_req(fd, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
@ -1417,10 +1524,13 @@ static int sd_prealloc(const char *filename)
uint32_t idx, max_idx;
int64_t vdi_size;
void *buf = g_malloc0(SD_DATA_OBJ_SIZE);
Error *local_err = NULL;
int ret;
ret = bdrv_file_open(&bs, filename, NULL, BDRV_O_RDWR);
ret = bdrv_file_open(&bs, filename, NULL, BDRV_O_RDWR, &local_err);
if (ret < 0) {
qerror_report_err(local_err);
error_free(local_err);
goto out;
}
@ -1447,14 +1557,15 @@ static int sd_prealloc(const char *filename)
}
out:
if (bs) {
bdrv_delete(bs);
bdrv_unref(bs);
}
g_free(buf);
return ret;
}
static int sd_create(const char *filename, QEMUOptionParameter *options)
static int sd_create(const char *filename, QEMUOptionParameter *options,
Error **errp)
{
int ret = 0;
uint32_t vid = 0, base_vid = 0;
@ -1464,6 +1575,7 @@ static int sd_create(const char *filename, QEMUOptionParameter *options)
char vdi[SD_MAX_VDI_LEN], tag[SD_MAX_VDI_TAG_LEN];
uint32_t snapid;
bool prealloc = false;
Error *local_err = NULL;
s = g_malloc0(sizeof(BDRVSheepdogState));
@ -1517,8 +1629,10 @@ static int sd_create(const char *filename, QEMUOptionParameter *options)
goto out;
}
ret = bdrv_file_open(&bs, backing_file, NULL, 0);
ret = bdrv_file_open(&bs, backing_file, NULL, 0, &local_err);
if (ret < 0) {
qerror_report_err(local_err);
error_free(local_err);
goto out;
}
@ -1526,16 +1640,17 @@ static int sd_create(const char *filename, QEMUOptionParameter *options)
if (!is_snapshot(&s->inode)) {
error_report("cannot clone from a non snapshot vdi");
bdrv_delete(bs);
bdrv_unref(bs);
ret = -EINVAL;
goto out;
}
base_vid = s->inode.vdi_id;
bdrv_delete(bs);
bdrv_unref(bs);
}
ret = do_sd_create(s, vdi, vdi_size, base_vid, &vid, 0);
/* TODO: allow users to specify copy number */
ret = do_sd_create(s, vdi, vdi_size, base_vid, &vid, 0, 0);
if (!prealloc || ret) {
goto out;
}
@ -1578,7 +1693,7 @@ static void sd_close(BlockDriverState *bs)
error_report("%s, %s", sd_strerror(rsp->result), s->name);
}
qemu_aio_set_fd_handler(s->fd, NULL, NULL, NULL, NULL);
qemu_aio_set_fd_handler(s->fd, NULL, NULL, NULL);
closesocket(s->fd);
g_free(s->host_spec);
}
@ -1630,7 +1745,6 @@ static int sd_truncate(BlockDriverState *bs, int64_t offset)
*/
static void coroutine_fn sd_write_done(SheepdogAIOCB *acb)
{
int ret;
BDRVSheepdogState *s = acb->common.bs->opaque;
struct iovec iov;
AIOReq *aio_req;
@ -1652,18 +1766,13 @@ static void coroutine_fn sd_write_done(SheepdogAIOCB *acb)
aio_req = alloc_aio_req(s, acb, vid_to_vdi_oid(s->inode.vdi_id),
data_len, offset, 0, 0, offset);
QLIST_INSERT_HEAD(&s->inflight_aio_head, aio_req, aio_siblings);
ret = add_aio_request(s, aio_req, &iov, 1, false, AIOCB_WRITE_UDATA);
if (ret) {
free_aio_req(s, aio_req);
acb->ret = -EIO;
goto out;
}
add_aio_request(s, aio_req, &iov, 1, false, AIOCB_WRITE_UDATA);
acb->aio_done_func = sd_finish_aiocb;
acb->aiocb_type = AIOCB_WRITE_UDATA;
return;
}
out:
sd_finish_aiocb(acb);
}
@ -1725,7 +1834,7 @@ static int sd_create_branch(BDRVSheepdogState *s)
*/
deleted = sd_delete(s);
ret = do_sd_create(s, s->name, s->inode.vdi_size, s->inode.vdi_id, &vid,
!deleted);
!deleted, s->inode.copy_policy);
if (ret) {
goto out;
}
@ -1849,35 +1958,16 @@ static int coroutine_fn sd_co_rw_vector(void *p)
}
aio_req = alloc_aio_req(s, acb, oid, len, offset, flags, old_oid, done);
QLIST_INSERT_HEAD(&s->inflight_aio_head, aio_req, aio_siblings);
if (create) {
AIOReq *areq;
QLIST_FOREACH(areq, &s->inflight_aio_head, aio_siblings) {
if (areq->oid == oid) {
/*
* Sheepdog cannot handle simultaneous create
* requests to the same object. So we cannot send
* the request until the previous request
* finishes.
*/
aio_req->flags = 0;
aio_req->base_oid = 0;
QLIST_INSERT_HEAD(&s->pending_aio_head, aio_req,
aio_siblings);
goto done;
}
if (check_simultaneous_create(s, aio_req)) {
goto done;
}
}
QLIST_INSERT_HEAD(&s->inflight_aio_head, aio_req, aio_siblings);
ret = add_aio_request(s, aio_req, acb->qiov->iov, acb->qiov->niov,
create, acb->aiocb_type);
if (ret < 0) {
error_report("add_aio_request is failed");
free_aio_req(s, aio_req);
acb->ret = -EIO;
goto out;
}
add_aio_request(s, aio_req, acb->qiov->iov, acb->qiov->niov, create,
acb->aiocb_type);
done:
offset = 0;
idx++;
@ -1945,7 +2035,6 @@ static int coroutine_fn sd_co_flush_to_disk(BlockDriverState *bs)
BDRVSheepdogState *s = bs->opaque;
SheepdogAIOCB *acb;
AIOReq *aio_req;
int ret;
if (s->cache_flags != SD_FLAG_CMD_CACHE) {
return 0;
@ -1958,13 +2047,7 @@ static int coroutine_fn sd_co_flush_to_disk(BlockDriverState *bs)
aio_req = alloc_aio_req(s, acb, vid_to_vdi_oid(s->inode.vdi_id),
0, 0, 0, 0, 0);
QLIST_INSERT_HEAD(&s->inflight_aio_head, aio_req, aio_siblings);
ret = add_aio_request(s, aio_req, NULL, 0, false, acb->aiocb_type);
if (ret < 0) {
error_report("add_aio_request is failed");
free_aio_req(s, aio_req);
qemu_aio_release(acb);
return ret;
}
add_aio_request(s, aio_req, NULL, 0, false, acb->aiocb_type);
qemu_coroutine_yield();
return acb->ret;
@ -2015,7 +2098,7 @@ static int sd_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
}
ret = do_sd_create(s, s->name, s->inode.vdi_size, s->inode.vdi_id, &new_vid,
1);
1, s->inode.copy_policy);
if (ret < 0) {
error_report("failed to create inode for snapshot. %s",
strerror(errno));
@ -2089,7 +2172,10 @@ out:
return ret;
}
static int sd_snapshot_delete(BlockDriverState *bs, const char *snapshot_id)
static int sd_snapshot_delete(BlockDriverState *bs,
const char *snapshot_id,
const char *name,
Error **errp)
{
/* FIXME: Delete specified snapshot id. */
return 0;
@ -2287,9 +2373,9 @@ static coroutine_fn int sd_co_discard(BlockDriverState *bs, int64_t sector_num,
return acb->ret;
}
static coroutine_fn int
sd_co_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
int *pnum)
static coroutine_fn int64_t
sd_co_get_block_status(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
int *pnum)
{
BDRVSheepdogState *s = bs->opaque;
SheepdogInode *inode = &s->inode;
@ -2297,7 +2383,7 @@ sd_co_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
end = DIV_ROUND_UP((sector_num + nb_sectors) *
BDRV_SECTOR_SIZE, SD_DATA_OBJ_SIZE);
unsigned long idx;
int ret = 1;
int64_t ret = BDRV_BLOCK_DATA;
for (idx = start; idx < end; idx++) {
if (inode->data_vdi_id[idx] == 0) {
@ -2344,6 +2430,7 @@ static BlockDriver bdrv_sheepdog = {
.format_name = "sheepdog",
.protocol_name = "sheepdog",
.instance_size = sizeof(BDRVSheepdogState),
.bdrv_needs_filename = true,
.bdrv_file_open = sd_open,
.bdrv_close = sd_close,
.bdrv_create = sd_create,
@ -2355,7 +2442,7 @@ static BlockDriver bdrv_sheepdog = {
.bdrv_co_writev = sd_co_writev,
.bdrv_co_flush_to_disk = sd_co_flush_to_disk,
.bdrv_co_discard = sd_co_discard,
.bdrv_co_is_allocated = sd_co_is_allocated,
.bdrv_co_get_block_status = sd_co_get_block_status,
.bdrv_snapshot_create = sd_snapshot_create,
.bdrv_snapshot_goto = sd_snapshot_goto,
@ -2372,6 +2459,7 @@ static BlockDriver bdrv_sheepdog_tcp = {
.format_name = "sheepdog",
.protocol_name = "sheepdog+tcp",
.instance_size = sizeof(BDRVSheepdogState),
.bdrv_needs_filename = true,
.bdrv_file_open = sd_open,
.bdrv_close = sd_close,
.bdrv_create = sd_create,
@ -2383,7 +2471,7 @@ static BlockDriver bdrv_sheepdog_tcp = {
.bdrv_co_writev = sd_co_writev,
.bdrv_co_flush_to_disk = sd_co_flush_to_disk,
.bdrv_co_discard = sd_co_discard,
.bdrv_co_is_allocated = sd_co_is_allocated,
.bdrv_co_get_block_status = sd_co_get_block_status,
.bdrv_snapshot_create = sd_snapshot_create,
.bdrv_snapshot_goto = sd_snapshot_goto,
@ -2400,6 +2488,7 @@ static BlockDriver bdrv_sheepdog_unix = {
.format_name = "sheepdog",
.protocol_name = "sheepdog+unix",
.instance_size = sizeof(BDRVSheepdogState),
.bdrv_needs_filename = true,
.bdrv_file_open = sd_open,
.bdrv_close = sd_close,
.bdrv_create = sd_create,
@ -2411,7 +2500,7 @@ static BlockDriver bdrv_sheepdog_unix = {
.bdrv_co_writev = sd_co_writev,
.bdrv_co_flush_to_disk = sd_co_flush_to_disk,
.bdrv_co_discard = sd_co_discard,
.bdrv_co_is_allocated = sd_co_is_allocated,
.bdrv_co_get_block_status = sd_co_get_block_status,
.bdrv_snapshot_create = sd_snapshot_create,
.bdrv_snapshot_goto = sd_snapshot_goto,

View File

@ -48,6 +48,79 @@ int bdrv_snapshot_find(BlockDriverState *bs, QEMUSnapshotInfo *sn_info,
return ret;
}
/**
* Look up an internal snapshot by @id and @name.
* @bs: block device to search
* @id: unique snapshot ID, or NULL
* @name: snapshot name, or NULL
* @sn_info: location to store information on the snapshot found
* @errp: location to store error, will be set only for exception
*
* This function will traverse snapshot list in @bs to search the matching
* one, @id and @name are the matching condition:
* If both @id and @name are specified, find the first one with id @id and
* name @name.
* If only @id is specified, find the first one with id @id.
* If only @name is specified, find the first one with name @name.
* if none is specified, abort().
*
* Returns: true when a snapshot is found and @sn_info will be filled, false
* when error or not found. If all operation succeed but no matching one is
* found, @errp will NOT be set.
*/
bool bdrv_snapshot_find_by_id_and_name(BlockDriverState *bs,
const char *id,
const char *name,
QEMUSnapshotInfo *sn_info,
Error **errp)
{
QEMUSnapshotInfo *sn_tab, *sn;
int nb_sns, i;
bool ret = false;
assert(id || name);
nb_sns = bdrv_snapshot_list(bs, &sn_tab);
if (nb_sns < 0) {
error_setg_errno(errp, -nb_sns, "Failed to get a snapshot list");
return false;
} else if (nb_sns == 0) {
return false;
}
if (id && name) {
for (i = 0; i < nb_sns; i++) {
sn = &sn_tab[i];
if (!strcmp(sn->id_str, id) && !strcmp(sn->name, name)) {
*sn_info = *sn;
ret = true;
break;
}
}
} else if (id) {
for (i = 0; i < nb_sns; i++) {
sn = &sn_tab[i];
if (!strcmp(sn->id_str, id)) {
*sn_info = *sn;
ret = true;
break;
}
}
} else if (name) {
for (i = 0; i < nb_sns; i++) {
sn = &sn_tab[i];
if (!strcmp(sn->name, name)) {
*sn_info = *sn;
ret = true;
break;
}
}
}
g_free(sn_tab);
return ret;
}
int bdrv_can_snapshot(BlockDriverState *bs)
{
BlockDriver *drv = bs->drv;
@ -97,9 +170,9 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
if (bs->file) {
drv->bdrv_close(bs);
ret = bdrv_snapshot_goto(bs->file, snapshot_id);
open_ret = drv->bdrv_open(bs, NULL, bs->open_flags);
open_ret = drv->bdrv_open(bs, NULL, bs->open_flags, NULL);
if (open_ret < 0) {
bdrv_delete(bs->file);
bdrv_unref(bs->file);
bs->drv = NULL;
return open_ret;
}
@ -109,21 +182,73 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
return -ENOTSUP;
}
int bdrv_snapshot_delete(BlockDriverState *bs, const char *snapshot_id)
/**
* Delete an internal snapshot by @snapshot_id and @name.
* @bs: block device used in the operation
* @snapshot_id: unique snapshot ID, or NULL
* @name: snapshot name, or NULL
* @errp: location to store error
*
* If both @snapshot_id and @name are specified, delete the first one with
* id @snapshot_id and name @name.
* If only @snapshot_id is specified, delete the first one with id
* @snapshot_id.
* If only @name is specified, delete the first one with name @name.
* if none is specified, return -ENINVAL.
*
* Returns: 0 on success, -errno on failure. If @bs is not inserted, return
* -ENOMEDIUM. If @snapshot_id and @name are both NULL, return -EINVAL. If @bs
* does not support internal snapshot deletion, return -ENOTSUP. If @bs does
* not support parameter @snapshot_id or @name, or one of them is not correctly
* specified, return -EINVAL. If @bs can't find one matching @id and @name,
* return -ENOENT. If @errp != NULL, it will always be filled with error
* message on failure.
*/
int bdrv_snapshot_delete(BlockDriverState *bs,
const char *snapshot_id,
const char *name,
Error **errp)
{
BlockDriver *drv = bs->drv;
if (!drv) {
error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, bdrv_get_device_name(bs));
return -ENOMEDIUM;
}
if (!snapshot_id && !name) {
error_setg(errp, "snapshot_id and name are both NULL");
return -EINVAL;
}
if (drv->bdrv_snapshot_delete) {
return drv->bdrv_snapshot_delete(bs, snapshot_id);
return drv->bdrv_snapshot_delete(bs, snapshot_id, name, errp);
}
if (bs->file) {
return bdrv_snapshot_delete(bs->file, snapshot_id);
return bdrv_snapshot_delete(bs->file, snapshot_id, name, errp);
}
error_set(errp, QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED,
drv->format_name, bdrv_get_device_name(bs),
"internal snapshot deletion");
return -ENOTSUP;
}
void bdrv_snapshot_delete_by_id_or_name(BlockDriverState *bs,
const char *id_or_name,
Error **errp)
{
int ret;
Error *local_err = NULL;
ret = bdrv_snapshot_delete(bs, id_or_name, NULL, &local_err);
if (ret == -ENOENT || ret == -EINVAL) {
error_free(local_err);
local_err = NULL;
ret = bdrv_snapshot_delete(bs, NULL, id_or_name, &local_err);
}
if (ret < 0) {
error_propagate(errp, local_err);
}
}
int bdrv_snapshot_list(BlockDriverState *bs,
QEMUSnapshotInfo **psn_info)
{

View File

@ -608,7 +608,8 @@ static int connect_to_ssh(BDRVSSHState *s, QDict *options,
return ret;
}
static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags)
static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags,
Error **errp)
{
BDRVSSHState *s = bs->opaque;
int ret;
@ -650,7 +651,8 @@ static QEMUOptionParameter ssh_create_options[] = {
{ NULL }
};
static int ssh_create(const char *filename, QEMUOptionParameter *options)
static int ssh_create(const char *filename, QEMUOptionParameter *options,
Error **errp)
{
int r, ret;
Error *local_err = NULL;
@ -740,14 +742,6 @@ static void restart_coroutine(void *opaque)
qemu_coroutine_enter(co, NULL);
}
/* Always true because when we have called set_fd_handler there is
* always a request being processed.
*/
static int return_true(void *opaque)
{
return 1;
}
static coroutine_fn void set_fd_handler(BDRVSSHState *s)
{
int r;
@ -766,13 +760,13 @@ static coroutine_fn void set_fd_handler(BDRVSSHState *s)
DPRINTF("s->sock=%d rd_handler=%p wr_handler=%p", s->sock,
rd_handler, wr_handler);
qemu_aio_set_fd_handler(s->sock, rd_handler, wr_handler, return_true, co);
qemu_aio_set_fd_handler(s->sock, rd_handler, wr_handler, co);
}
static coroutine_fn void clear_fd_handler(BDRVSSHState *s)
{
DPRINTF("s->sock=%d", s->sock);
qemu_aio_set_fd_handler(s->sock, NULL, NULL, NULL, NULL);
qemu_aio_set_fd_handler(s->sock, NULL, NULL, NULL);
}
/* A non-blocking call returned EAGAIN, so yield, ensuring the

View File

@ -57,6 +57,11 @@ static void close_unused_images(BlockDriverState *top, BlockDriverState *base,
BlockDriverState *intermediate;
intermediate = top->backing_hd;
/* Must assign before bdrv_delete() to prevent traversing dangling pointer
* while we delete backing image instances.
*/
top->backing_hd = base;
while (intermediate) {
BlockDriverState *unused;
@ -68,9 +73,8 @@ static void close_unused_images(BlockDriverState *top, BlockDriverState *base,
unused = intermediate;
intermediate = intermediate->backing_hd;
unused->backing_hd = NULL;
bdrv_delete(unused);
bdrv_unref(unused);
}
top->backing_hd = base;
}
static void coroutine_fn stream_run(void *opaque)
@ -110,21 +114,22 @@ wait:
/* Note that even when no rate limit is applied we need to yield
* with no pending I/O here so that bdrv_drain_all() returns.
*/
block_job_sleep_ns(&s->common, rt_clock, delay_ns);
block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
if (block_job_is_cancelled(&s->common)) {
break;
}
ret = bdrv_co_is_allocated(bs, sector_num,
STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
copy = false;
ret = bdrv_is_allocated(bs, sector_num,
STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
if (ret == 1) {
/* Allocated in the top, no need to copy. */
copy = false;
} else {
} else if (ret >= 0) {
/* Copy if allocated in the intermediate images. Limit to the
* known-unallocated area [sector_num, sector_num+n). */
ret = bdrv_co_is_allocated_above(bs->backing_hd, base,
sector_num, n, &n);
ret = bdrv_is_allocated_above(bs->backing_hd, base,
sector_num, n, &n);
/* Finish early if end of backing file has been reached */
if (ret == 0 && n == 0) {
@ -134,7 +139,7 @@ wait:
copy = (ret == 1);
}
trace_stream_one_iteration(s, sector_num, n, ret);
if (ret >= 0 && copy) {
if (copy) {
if (s->common.speed) {
delay_ns = ratelimit_calculate_delay(&s->limit, n);
if (delay_ns > 0) {
@ -198,9 +203,9 @@ static void stream_set_speed(BlockJob *job, int64_t speed, Error **errp)
ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
}
static const BlockJobType stream_job_type = {
static const BlockJobDriver stream_job_driver = {
.instance_size = sizeof(StreamBlockJob),
.job_type = "stream",
.job_type = BLOCK_JOB_TYPE_STREAM,
.set_speed = stream_set_speed,
};
@ -219,7 +224,7 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
return;
}
s = block_job_create(&stream_job_type, bs, speed, cb, opaque, errp);
s = block_job_create(&stream_job_driver, bs, speed, cb, opaque, errp);
if (!s) {
return;
}

View File

@ -165,7 +165,7 @@ typedef struct {
uuid_t uuid_link;
uuid_t uuid_parent;
uint64_t unused2[7];
} VdiHeader;
} QEMU_PACKED VdiHeader;
typedef struct {
/* The block map entries are little endian (even in memory). */
@ -364,7 +364,8 @@ static int vdi_probe(const uint8_t *buf, int buf_size, const char *filename)
return result;
}
static int vdi_open(BlockDriverState *bs, QDict *options, int flags)
static int vdi_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVVdiState *s = bs->opaque;
VdiHeader header;
@ -470,7 +471,7 @@ static int vdi_reopen_prepare(BDRVReopenState *state,
return 0;
}
static int coroutine_fn vdi_co_is_allocated(BlockDriverState *bs,
static int64_t coroutine_fn vdi_co_get_block_status(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, int *pnum)
{
/* TODO: Check for too large sector_num (in bdrv_is_allocated or here). */
@ -479,12 +480,23 @@ static int coroutine_fn vdi_co_is_allocated(BlockDriverState *bs,
size_t sector_in_block = sector_num % s->block_sectors;
int n_sectors = s->block_sectors - sector_in_block;
uint32_t bmap_entry = le32_to_cpu(s->bmap[bmap_index]);
uint64_t offset;
int result;
logout("%p, %" PRId64 ", %d, %p\n", bs, sector_num, nb_sectors, pnum);
if (n_sectors > nb_sectors) {
n_sectors = nb_sectors;
}
*pnum = n_sectors;
return VDI_IS_ALLOCATED(bmap_entry);
result = VDI_IS_ALLOCATED(bmap_entry);
if (!result) {
return 0;
}
offset = s->header.offset_data +
(uint64_t)bmap_entry * s->block_size +
sector_in_block * SECTOR_SIZE;
return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
}
static int vdi_co_read(BlockDriverState *bs,
@ -633,7 +645,8 @@ static int vdi_co_write(BlockDriverState *bs,
return ret;
}
static int vdi_create(const char *filename, QEMUOptionParameter *options)
static int vdi_create(const char *filename, QEMUOptionParameter *options,
Error **errp)
{
int fd;
int result = 0;
@ -780,7 +793,7 @@ static BlockDriver bdrv_vdi = {
.bdrv_reopen_prepare = vdi_reopen_prepare,
.bdrv_create = vdi_create,
.bdrv_has_zero_init = bdrv_has_zero_init_1,
.bdrv_co_is_allocated = vdi_co_is_allocated,
.bdrv_co_get_block_status = vdi_co_get_block_status,
.bdrv_make_empty = vdi_make_empty,
.bdrv_read = vdi_co_read,

216
block/vhdx-endian.c Normal file
View File

@ -0,0 +1,216 @@
/*
* Block driver for Hyper-V VHDX Images
*
* Copyright (c) 2013 Red Hat, Inc.,
*
* Authors:
* Jeff Cody <jcody@redhat.com>
*
* This is based on the "VHDX Format Specification v1.00", published 8/25/2012
* by Microsoft:
* https://www.microsoft.com/en-us/download/details.aspx?id=34750
*
* This work is licensed under the terms of the GNU LGPL, version 2 or later.
* See the COPYING.LIB file in the top-level directory.
*
*/
#include "qemu-common.h"
#include "block/block_int.h"
#include "block/vhdx.h"
#include <uuid/uuid.h>
/*
* All the VHDX formats on disk are little endian - the following
* are helper import/export functions to correctly convert
* endianness from disk read to native cpu format, and back again.
*/
/* VHDX File Header */
void vhdx_header_le_import(VHDXHeader *h)
{
assert(h != NULL);
le32_to_cpus(&h->signature);
le32_to_cpus(&h->checksum);
le64_to_cpus(&h->sequence_number);
leguid_to_cpus(&h->file_write_guid);
leguid_to_cpus(&h->data_write_guid);
leguid_to_cpus(&h->log_guid);
le16_to_cpus(&h->log_version);
le16_to_cpus(&h->version);
le32_to_cpus(&h->log_length);
le64_to_cpus(&h->log_offset);
}
void vhdx_header_le_export(VHDXHeader *orig_h, VHDXHeader *new_h)
{
assert(orig_h != NULL);
assert(new_h != NULL);
new_h->signature = cpu_to_le32(orig_h->signature);
new_h->checksum = cpu_to_le32(orig_h->checksum);
new_h->sequence_number = cpu_to_le64(orig_h->sequence_number);
new_h->file_write_guid = orig_h->file_write_guid;
new_h->data_write_guid = orig_h->data_write_guid;
new_h->log_guid = orig_h->log_guid;
cpu_to_leguids(&new_h->file_write_guid);
cpu_to_leguids(&new_h->data_write_guid);
cpu_to_leguids(&new_h->log_guid);
new_h->log_version = cpu_to_le16(orig_h->log_version);
new_h->version = cpu_to_le16(orig_h->version);
new_h->log_length = cpu_to_le32(orig_h->log_length);
new_h->log_offset = cpu_to_le64(orig_h->log_offset);
}
/* VHDX Log Headers */
void vhdx_log_desc_le_import(VHDXLogDescriptor *d)
{
assert(d != NULL);
le32_to_cpus(&d->signature);
le32_to_cpus(&d->trailing_bytes);
le64_to_cpus(&d->leading_bytes);
le64_to_cpus(&d->file_offset);
le64_to_cpus(&d->sequence_number);
}
void vhdx_log_desc_le_export(VHDXLogDescriptor *d)
{
assert(d != NULL);
cpu_to_le32s(&d->signature);
cpu_to_le32s(&d->trailing_bytes);
cpu_to_le64s(&d->leading_bytes);
cpu_to_le64s(&d->file_offset);
cpu_to_le64s(&d->sequence_number);
}
void vhdx_log_data_le_export(VHDXLogDataSector *d)
{
assert(d != NULL);
cpu_to_le32s(&d->data_signature);
cpu_to_le32s(&d->sequence_high);
cpu_to_le32s(&d->sequence_low);
}
void vhdx_log_entry_hdr_le_import(VHDXLogEntryHeader *hdr)
{
assert(hdr != NULL);
le32_to_cpus(&hdr->signature);
le32_to_cpus(&hdr->checksum);
le32_to_cpus(&hdr->entry_length);
le32_to_cpus(&hdr->tail);
le64_to_cpus(&hdr->sequence_number);
le32_to_cpus(&hdr->descriptor_count);
leguid_to_cpus(&hdr->log_guid);
le64_to_cpus(&hdr->flushed_file_offset);
le64_to_cpus(&hdr->last_file_offset);
}
void vhdx_log_entry_hdr_le_export(VHDXLogEntryHeader *hdr)
{
assert(hdr != NULL);
cpu_to_le32s(&hdr->signature);
cpu_to_le32s(&hdr->checksum);
cpu_to_le32s(&hdr->entry_length);
cpu_to_le32s(&hdr->tail);
cpu_to_le64s(&hdr->sequence_number);
cpu_to_le32s(&hdr->descriptor_count);
cpu_to_leguids(&hdr->log_guid);
cpu_to_le64s(&hdr->flushed_file_offset);
cpu_to_le64s(&hdr->last_file_offset);
}
/* Region table entries */
void vhdx_region_header_le_import(VHDXRegionTableHeader *hdr)
{
assert(hdr != NULL);
le32_to_cpus(&hdr->signature);
le32_to_cpus(&hdr->checksum);
le32_to_cpus(&hdr->entry_count);
}
void vhdx_region_header_le_export(VHDXRegionTableHeader *hdr)
{
assert(hdr != NULL);
cpu_to_le32s(&hdr->signature);
cpu_to_le32s(&hdr->checksum);
cpu_to_le32s(&hdr->entry_count);
}
void vhdx_region_entry_le_import(VHDXRegionTableEntry *e)
{
assert(e != NULL);
leguid_to_cpus(&e->guid);
le64_to_cpus(&e->file_offset);
le32_to_cpus(&e->length);
le32_to_cpus(&e->data_bits);
}
void vhdx_region_entry_le_export(VHDXRegionTableEntry *e)
{
assert(e != NULL);
cpu_to_leguids(&e->guid);
cpu_to_le64s(&e->file_offset);
cpu_to_le32s(&e->length);
cpu_to_le32s(&e->data_bits);
}
/* Metadata headers & table */
void vhdx_metadata_header_le_import(VHDXMetadataTableHeader *hdr)
{
assert(hdr != NULL);
le64_to_cpus(&hdr->signature);
le16_to_cpus(&hdr->entry_count);
}
void vhdx_metadata_header_le_export(VHDXMetadataTableHeader *hdr)
{
assert(hdr != NULL);
cpu_to_le64s(&hdr->signature);
cpu_to_le16s(&hdr->entry_count);
}
void vhdx_metadata_entry_le_import(VHDXMetadataTableEntry *e)
{
assert(e != NULL);
leguid_to_cpus(&e->item_id);
le32_to_cpus(&e->offset);
le32_to_cpus(&e->length);
le32_to_cpus(&e->data_bits);
}
void vhdx_metadata_entry_le_export(VHDXMetadataTableEntry *e)
{
assert(e != NULL);
cpu_to_leguids(&e->item_id);
cpu_to_le32s(&e->offset);
cpu_to_le32s(&e->length);
cpu_to_le32s(&e->data_bits);
}

1010
block/vhdx-log.c Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -6,9 +6,9 @@
* Authors:
* Jeff Cody <jcody@redhat.com>
*
* This is based on the "VHDX Format Specification v0.95", published 4/12/2012
* This is based on the "VHDX Format Specification v1.00", published 8/25/2012
* by Microsoft:
* https://www.microsoft.com/en-us/download/details.aspx?id=29681
* https://www.microsoft.com/en-us/download/details.aspx?id=34750
*
* This work is licensed under the terms of the GNU LGPL, version 2 or later.
* See the COPYING.LIB file in the top-level directory.
@ -18,6 +18,11 @@
#ifndef BLOCK_VHDX_H
#define BLOCK_VHDX_H
#define KiB (1 * 1024)
#define MiB (KiB * 1024)
#define GiB (MiB * 1024)
#define TiB ((uint64_t) GiB * 1024)
/* Structures and fields present in the VHDX file */
/* The header section has the following blocks,
@ -30,14 +35,15 @@
* 0.........64KB...........128KB........192KB..........256KB................1MB
*/
#define VHDX_HEADER_BLOCK_SIZE (64*1024)
#define VHDX_HEADER_BLOCK_SIZE (64 * 1024)
#define VHDX_FILE_ID_OFFSET 0
#define VHDX_HEADER1_OFFSET (VHDX_HEADER_BLOCK_SIZE*1)
#define VHDX_HEADER2_OFFSET (VHDX_HEADER_BLOCK_SIZE*2)
#define VHDX_REGION_TABLE_OFFSET (VHDX_HEADER_BLOCK_SIZE*3)
#define VHDX_HEADER1_OFFSET (VHDX_HEADER_BLOCK_SIZE * 1)
#define VHDX_HEADER2_OFFSET (VHDX_HEADER_BLOCK_SIZE * 2)
#define VHDX_REGION_TABLE_OFFSET (VHDX_HEADER_BLOCK_SIZE * 3)
#define VHDX_REGION_TABLE2_OFFSET (VHDX_HEADER_BLOCK_SIZE * 4)
#define VHDX_HEADER_SECTION_END (1 * MiB)
/*
* A note on the use of MS-GUID fields. For more details on the GUID,
* please see: https://en.wikipedia.org/wiki/Globally_unique_identifier.
@ -55,10 +61,11 @@
/* These structures are ones that are defined in the VHDX specification
* document */
#define VHDX_FILE_SIGNATURE 0x656C696678646876 /* "vhdxfile" in ASCII */
typedef struct VHDXFileIdentifier {
uint64_t signature; /* "vhdxfile" in ASCII */
uint16_t creator[256]; /* optional; utf-16 string to identify
the vhdx file creator. Diagnotistic
the vhdx file creator. Diagnostic
only */
} VHDXFileIdentifier;
@ -67,7 +74,7 @@ typedef struct VHDXFileIdentifier {
* Microsoft is not just 16 bytes though - it is a structure that is defined,
* so we need to follow it here so that endianness does not trip us up */
typedef struct MSGUID {
typedef struct QEMU_PACKED MSGUID {
uint32_t data1;
uint16_t data2;
uint16_t data3;
@ -77,14 +84,15 @@ typedef struct MSGUID {
#define guid_eq(a, b) \
(memcmp(&(a), &(b), sizeof(MSGUID)) == 0)
#define VHDX_HEADER_SIZE (4*1024) /* although the vhdx_header struct in disk
is only 582 bytes, for purposes of crc
the header is the first 4KB of the 64KB
block */
#define VHDX_HEADER_SIZE (4 * 1024) /* although the vhdx_header struct in disk
is only 582 bytes, for purposes of crc
the header is the first 4KB of the 64KB
block */
/* The full header is 4KB, although the actual header data is much smaller.
* But for the checksum calculation, it is over the entire 4KB structure,
* not just the defined portion of it */
#define VHDX_HEADER_SIGNATURE 0x64616568
typedef struct QEMU_PACKED VHDXHeader {
uint32_t signature; /* "head" in ASCII */
uint32_t checksum; /* CRC-32C hash of the whole header */
@ -92,7 +100,7 @@ typedef struct QEMU_PACKED VHDXHeader {
VHDX file has 2 of these headers,
and only the header with the highest
sequence number is valid */
MSGUID file_write_guid; /* 128 bit unique identifier. Must be
MSGUID file_write_guid; /* 128 bit unique identifier. Must be
updated to new, unique value before
the first modification is made to
file */
@ -114,9 +122,9 @@ typedef struct QEMU_PACKED VHDXHeader {
there is no valid log. If non-zero,
log entries with this guid are
valid. */
uint16_t log_version; /* version of the log format. Mustn't be
zero, unless log_guid is also zero */
uint16_t version; /* version of th evhdx file. Currently,
uint16_t log_version; /* version of the log format. Must be
set to zero */
uint16_t version; /* version of the vhdx file. Currently,
only supported version is "1" */
uint32_t log_length; /* length of the log. Must be multiple
of 1MB */
@ -125,6 +133,7 @@ typedef struct QEMU_PACKED VHDXHeader {
} VHDXHeader;
/* Header for the region table block */
#define VHDX_REGION_SIGNATURE 0x69676572 /* "regi" in ASCII */
typedef struct QEMU_PACKED VHDXRegionTableHeader {
uint32_t signature; /* "regi" in ASCII */
uint32_t checksum; /* CRC-32C hash of the 64KB table */
@ -151,7 +160,10 @@ typedef struct QEMU_PACKED VHDXRegionTableEntry {
/* ---- LOG ENTRY STRUCTURES ---- */
#define VHDX_LOG_MIN_SIZE (1024 * 1024)
#define VHDX_LOG_SECTOR_SIZE 4096
#define VHDX_LOG_HDR_SIZE 64
#define VHDX_LOG_SIGNATURE 0x65676f6c
typedef struct QEMU_PACKED VHDXLogEntryHeader {
uint32_t signature; /* "loge" in ASCII */
uint32_t checksum; /* CRC-32C hash of the 64KB table */
@ -174,7 +186,8 @@ typedef struct QEMU_PACKED VHDXLogEntryHeader {
} VHDXLogEntryHeader;
#define VHDX_LOG_DESC_SIZE 32
#define VHDX_LOG_DESC_SIGNATURE 0x63736564
#define VHDX_LOG_ZERO_SIGNATURE 0x6f72657a
typedef struct QEMU_PACKED VHDXLogDescriptor {
uint32_t signature; /* "zero" or "desc" in ASCII */
union {
@ -194,6 +207,7 @@ typedef struct QEMU_PACKED VHDXLogDescriptor {
vhdx_log_entry_header */
} VHDXLogDescriptor;
#define VHDX_LOG_DATA_SIGNATURE 0x61746164
typedef struct QEMU_PACKED VHDXLogDataSector {
uint32_t data_signature; /* "data" in ASCII */
uint32_t sequence_high; /* 4 MSB of 8 byte sequence_number */
@ -212,19 +226,19 @@ typedef struct QEMU_PACKED VHDXLogDataSector {
#define PAYLOAD_BLOCK_UNDEFINED 1
#define PAYLOAD_BLOCK_ZERO 2
#define PAYLOAD_BLOCK_UNMAPPED 5
#define PAYLOAD_BLOCK_FULL_PRESENT 6
#define PAYLOAD_BLOCK_FULLY_PRESENT 6
#define PAYLOAD_BLOCK_PARTIALLY_PRESENT 7
#define SB_BLOCK_NOT_PRESENT 0
#define SB_BLOCK_PRESENT 6
/* per the spec */
#define VHDX_MAX_SECTORS_PER_BLOCK (1<<23)
#define VHDX_MAX_SECTORS_PER_BLOCK (1 << 23)
/* upper 44 bits are the file offset in 1MB units lower 3 bits are the state
other bits are reserved */
#define VHDX_BAT_STATE_BIT_MASK 0x07
#define VHDX_BAT_FILE_OFF_BITS (64-44)
#define VHDX_BAT_FILE_OFF_MASK 0xFFFFFFFFFFF00000 /* upper 44 bits */
typedef uint64_t VHDXBatEntry;
/* ---- METADATA REGION STRUCTURES ---- */
@ -233,6 +247,7 @@ typedef uint64_t VHDXBatEntry;
#define VHDX_METADATA_MAX_ENTRIES 2047 /* not including the header */
#define VHDX_METADATA_TABLE_MAX_SIZE \
(VHDX_METADATA_ENTRY_SIZE * (VHDX_METADATA_MAX_ENTRIES+1))
#define VHDX_METADATA_SIGNATURE 0x617461646174656D /* "metadata" in ASCII */
typedef struct QEMU_PACKED VHDXMetadataTableHeader {
uint64_t signature; /* "metadata" in ASCII */
uint16_t reserved;
@ -252,8 +267,8 @@ typedef struct QEMU_PACKED VHDXMetadataTableEntry {
metadata region */
/* note: if length = 0, so is offset */
uint32_t length; /* length of metadata. <= 1MB. */
uint32_t data_bits; /* least-significant 3 bits are flags, the
rest are reserved (see above) */
uint32_t data_bits; /* least-significant 3 bits are flags,
the rest are reserved (see above) */
uint32_t reserved2;
} VHDXMetadataTableEntry;
@ -262,13 +277,16 @@ typedef struct QEMU_PACKED VHDXMetadataTableEntry {
If set indicates a fixed
size VHDX file */
#define VHDX_PARAMS_HAS_PARENT 0x02 /* has parent / backing file */
#define VHDX_BLOCK_SIZE_MIN (1 * MiB)
#define VHDX_BLOCK_SIZE_MAX (256 * MiB)
typedef struct QEMU_PACKED VHDXFileParameters {
uint32_t block_size; /* size of each payload block, always
power of 2, <= 256MB and >= 1MB. */
uint32_t data_bits; /* least-significant 2 bits are flags, the rest
are reserved (see above) */
uint32_t data_bits; /* least-significant 2 bits are flags,
the rest are reserved (see above) */
} VHDXFileParameters;
#define VHDX_MAX_IMAGE_SIZE ((uint64_t) 64 * TiB)
typedef struct QEMU_PACKED VHDXVirtualDiskSize {
uint64_t virtual_disk_size; /* Size of the virtual disk, in bytes.
Must be multiple of the sector size,
@ -276,7 +294,7 @@ typedef struct QEMU_PACKED VHDXVirtualDiskSize {
} VHDXVirtualDiskSize;
typedef struct QEMU_PACKED VHDXPage83Data {
MSGUID page_83_data[16]; /* unique id for scsi devices that
MSGUID page_83_data; /* unique id for scsi devices that
support page 0x83 */
} VHDXPage83Data;
@ -291,7 +309,7 @@ typedef struct QEMU_PACKED VHDXVirtualDiskPhysicalSectorSize {
} VHDXVirtualDiskPhysicalSectorSize;
typedef struct QEMU_PACKED VHDXParentLocatorHeader {
MSGUID locator_type[16]; /* type of the parent virtual disk. */
MSGUID locator_type; /* type of the parent virtual disk. */
uint16_t reserved;
uint16_t key_value_count; /* number of key/value pairs for this
locator */
@ -308,18 +326,122 @@ typedef struct QEMU_PACKED VHDXParentLocatorEntry {
/* ----- END VHDX SPECIFICATION STRUCTURES ---- */
typedef struct VHDXMetadataEntries {
VHDXMetadataTableEntry file_parameters_entry;
VHDXMetadataTableEntry virtual_disk_size_entry;
VHDXMetadataTableEntry page83_data_entry;
VHDXMetadataTableEntry logical_sector_size_entry;
VHDXMetadataTableEntry phys_sector_size_entry;
VHDXMetadataTableEntry parent_locator_entry;
uint16_t present;
} VHDXMetadataEntries;
typedef struct VHDXLogEntries {
uint64_t offset;
uint64_t length;
uint32_t write;
uint32_t read;
VHDXLogEntryHeader *hdr;
void *desc_buffer;
uint64_t sequence;
uint32_t tail;
} VHDXLogEntries;
typedef struct VHDXRegionEntry {
uint64_t start;
uint64_t end;
QLIST_ENTRY(VHDXRegionEntry) entries;
} VHDXRegionEntry;
typedef struct BDRVVHDXState {
CoMutex lock;
int curr_header;
VHDXHeader *headers[2];
VHDXRegionTableHeader rt;
VHDXRegionTableEntry bat_rt; /* region table for the BAT */
VHDXRegionTableEntry metadata_rt; /* region table for the metadata */
VHDXMetadataTableHeader metadata_hdr;
VHDXMetadataEntries metadata_entries;
VHDXFileParameters params;
uint32_t block_size;
uint32_t block_size_bits;
uint32_t sectors_per_block;
uint32_t sectors_per_block_bits;
uint64_t virtual_disk_size;
uint32_t logical_sector_size;
uint32_t physical_sector_size;
uint64_t chunk_ratio;
uint32_t chunk_ratio_bits;
uint32_t logical_sector_size_bits;
uint32_t bat_entries;
VHDXBatEntry *bat;
uint64_t bat_offset;
bool first_visible_write;
MSGUID session_guid;
VHDXLogEntries log;
VHDXParentLocatorHeader parent_header;
VHDXParentLocatorEntry *parent_entries;
Error *migration_blocker;
QLIST_HEAD(VHDXRegionHead, VHDXRegionEntry) regions;
} BDRVVHDXState;
void vhdx_guid_generate(MSGUID *guid);
int vhdx_update_headers(BlockDriverState *bs, BDRVVHDXState *s, bool rw,
MSGUID *log_guid);
uint32_t vhdx_update_checksum(uint8_t *buf, size_t size, int crc_offset);
uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size,
int crc_offset);
bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset);
int vhdx_parse_log(BlockDriverState *bs, BDRVVHDXState *s, bool *flushed);
static void leguid_to_cpus(MSGUID *guid)
int vhdx_log_write_and_flush(BlockDriverState *bs, BDRVVHDXState *s,
void *data, uint32_t length, uint64_t offset);
static inline void leguid_to_cpus(MSGUID *guid)
{
le32_to_cpus(&guid->data1);
le16_to_cpus(&guid->data2);
le16_to_cpus(&guid->data3);
}
static inline void cpu_to_leguids(MSGUID *guid)
{
cpu_to_le32s(&guid->data1);
cpu_to_le16s(&guid->data2);
cpu_to_le16s(&guid->data3);
}
void vhdx_header_le_import(VHDXHeader *h);
void vhdx_header_le_export(VHDXHeader *orig_h, VHDXHeader *new_h);
void vhdx_log_desc_le_import(VHDXLogDescriptor *d);
void vhdx_log_desc_le_export(VHDXLogDescriptor *d);
void vhdx_log_data_le_export(VHDXLogDataSector *d);
void vhdx_log_entry_hdr_le_import(VHDXLogEntryHeader *hdr);
void vhdx_log_entry_hdr_le_export(VHDXLogEntryHeader *hdr);
void vhdx_region_header_le_import(VHDXRegionTableHeader *hdr);
void vhdx_region_header_le_export(VHDXRegionTableHeader *hdr);
void vhdx_region_entry_le_import(VHDXRegionTableEntry *e);
void vhdx_region_entry_le_export(VHDXRegionTableEntry *e);
void vhdx_metadata_header_le_import(VHDXMetadataTableHeader *hdr);
void vhdx_metadata_header_le_export(VHDXMetadataTableHeader *hdr);
void vhdx_metadata_entry_le_import(VHDXMetadataTableEntry *e);
void vhdx_metadata_entry_le_export(VHDXMetadataTableEntry *e);
int vhdx_user_visible_write(BlockDriverState *bs, BDRVVHDXState *s);
#endif

View File

@ -105,18 +105,22 @@ typedef struct VmdkExtent {
uint32_t l2_cache_offsets[L2_CACHE_SIZE];
uint32_t l2_cache_counts[L2_CACHE_SIZE];
unsigned int cluster_sectors;
int64_t cluster_sectors;
char *type;
} VmdkExtent;
typedef struct BDRVVmdkState {
CoMutex lock;
uint64_t desc_offset;
bool cid_updated;
bool cid_checked;
uint32_t cid;
uint32_t parent_cid;
int num_extents;
/* Extent array with num_extents entries, ascend ordered by address */
VmdkExtent *extents;
Error *migration_blocker;
char *create_type;
} BDRVVmdkState;
typedef struct VmdkMetaData {
@ -197,8 +201,6 @@ static int vmdk_probe(const uint8_t *buf, int buf_size, const char *filename)
}
}
#define CHECK_CID 1
#define SECTOR_SIZE 512
#define DESC_SIZE (20 * SECTOR_SIZE) /* 20 sectors of 512 bytes each */
#define BUF_SIZE 4096
@ -215,8 +217,9 @@ static void vmdk_free_extents(BlockDriverState *bs)
g_free(e->l1_table);
g_free(e->l2_cache);
g_free(e->l1_backup_table);
g_free(e->type);
if (e->file != bs->file) {
bdrv_delete(e->file);
bdrv_unref(e->file);
}
}
g_free(s->extents);
@ -301,19 +304,18 @@ static int vmdk_write_cid(BlockDriverState *bs, uint32_t cid)
static int vmdk_is_cid_valid(BlockDriverState *bs)
{
#ifdef CHECK_CID
BDRVVmdkState *s = bs->opaque;
BlockDriverState *p_bs = bs->backing_hd;
uint32_t cur_pcid;
if (p_bs) {
if (!s->cid_checked && p_bs) {
cur_pcid = vmdk_read_cid(p_bs, 0);
if (s->parent_cid != cur_pcid) {
/* CID not valid */
return 0;
}
}
#endif
s->cid_checked = true;
/* CID valid */
return 1;
}
@ -331,8 +333,7 @@ static int vmdk_reopen_prepare(BDRVReopenState *state,
assert(state->bs != NULL);
if (queue == NULL) {
error_set(errp, ERROR_CLASS_GENERIC_ERROR,
"No reopen queue for VMDK extents");
error_setg(errp, "No reopen queue for VMDK extents");
goto exit;
}
@ -391,15 +392,24 @@ static int vmdk_add_extent(BlockDriverState *bs,
int64_t l1_offset, int64_t l1_backup_offset,
uint32_t l1_size,
int l2_size, uint64_t cluster_sectors,
VmdkExtent **new_extent)
VmdkExtent **new_extent,
Error **errp)
{
VmdkExtent *extent;
BDRVVmdkState *s = bs->opaque;
if (cluster_sectors > 0x200000) {
/* 0x200000 * 512Bytes = 1GB for one cluster is unrealistic */
error_report("invalid granularity, image may be corrupt");
return -EINVAL;
error_setg(errp, "Invalid granularity, image may be corrupt");
return -EFBIG;
}
if (l1_size > 512 * 1024 * 1024) {
/* Although with big capacity and small l1_entry_sectors, we can get a
* big l1_size, we don't want unbounded value to allocate the table.
* Limit it to 512M, which is 16PB for default cluster and L2 table
* size */
error_setg(errp, "L1 size too big");
return -EFBIG;
}
s->extents = g_realloc(s->extents,
@ -416,7 +426,7 @@ static int vmdk_add_extent(BlockDriverState *bs,
extent->l1_size = l1_size;
extent->l1_entry_sectors = l2_size * cluster_sectors;
extent->l2_size = l2_size;
extent->cluster_sectors = cluster_sectors;
extent->cluster_sectors = flat ? sectors : cluster_sectors;
if (s->num_extents > 1) {
extent->end_sector = (*(extent - 1)).end_sector + extent->sectors;
@ -430,7 +440,8 @@ static int vmdk_add_extent(BlockDriverState *bs,
return 0;
}
static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent)
static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
Error **errp)
{
int ret;
int l1_size, i;
@ -439,10 +450,13 @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent)
l1_size = extent->l1_size * sizeof(uint32_t);
extent->l1_table = g_malloc(l1_size);
ret = bdrv_pread(extent->file,
extent->l1_table_offset,
extent->l1_table,
l1_size);
extent->l1_table_offset,
extent->l1_table,
l1_size);
if (ret < 0) {
error_setg_errno(errp, -ret,
"Could not read l1 table from extent '%s'",
extent->file->filename);
goto fail_l1;
}
for (i = 0; i < extent->l1_size; i++) {
@ -452,10 +466,13 @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent)
if (extent->l1_backup_table_offset) {
extent->l1_backup_table = g_malloc(l1_size);
ret = bdrv_pread(extent->file,
extent->l1_backup_table_offset,
extent->l1_backup_table,
l1_size);
extent->l1_backup_table_offset,
extent->l1_backup_table,
l1_size);
if (ret < 0) {
error_setg_errno(errp, -ret,
"Could not read l1 backup table from extent '%s'",
extent->file->filename);
goto fail_l1b;
}
for (i = 0; i < extent->l1_size; i++) {
@ -473,9 +490,9 @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent)
return ret;
}
static int vmdk_open_vmdk3(BlockDriverState *bs,
BlockDriverState *file,
int flags)
static int vmdk_open_vmfs_sparse(BlockDriverState *bs,
BlockDriverState *file,
int flags, Error **errp)
{
int ret;
uint32_t magic;
@ -484,20 +501,24 @@ static int vmdk_open_vmdk3(BlockDriverState *bs,
ret = bdrv_pread(file, sizeof(magic), &header, sizeof(header));
if (ret < 0) {
error_setg_errno(errp, -ret,
"Could not read header from file '%s'",
file->filename);
return ret;
}
ret = vmdk_add_extent(bs,
bs->file, false,
le32_to_cpu(header.disk_sectors),
le32_to_cpu(header.l1dir_offset) << 9,
0, 1 << 6, 1 << 9,
le32_to_cpu(header.granularity),
&extent);
ret = vmdk_add_extent(bs, file, false,
le32_to_cpu(header.disk_sectors),
le32_to_cpu(header.l1dir_offset) << 9,
0,
le32_to_cpu(header.l1dir_size),
4096,
le32_to_cpu(header.granularity),
&extent,
errp);
if (ret < 0) {
return ret;
}
ret = vmdk_init_tables(bs, extent);
ret = vmdk_init_tables(bs, extent, errp);
if (ret) {
/* free extent allocated by vmdk_add_extent */
vmdk_free_last_extent(bs);
@ -506,30 +527,37 @@ static int vmdk_open_vmdk3(BlockDriverState *bs,
}
static int vmdk_open_desc_file(BlockDriverState *bs, int flags,
uint64_t desc_offset);
uint64_t desc_offset, Error **errp);
static int vmdk_open_vmdk4(BlockDriverState *bs,
BlockDriverState *file,
int flags)
int flags, Error **errp)
{
int ret;
uint32_t magic;
uint32_t l1_size, l1_entry_sectors;
VMDK4Header header;
VmdkExtent *extent;
BDRVVmdkState *s = bs->opaque;
int64_t l1_backup_offset = 0;
ret = bdrv_pread(file, sizeof(magic), &header, sizeof(header));
if (ret < 0) {
return ret;
error_setg_errno(errp, -ret,
"Could not read header from file '%s'",
file->filename);
}
if (header.capacity == 0) {
uint64_t desc_offset = le64_to_cpu(header.desc_offset);
if (desc_offset) {
return vmdk_open_desc_file(bs, flags, desc_offset << 9);
return vmdk_open_desc_file(bs, flags, desc_offset << 9, errp);
}
}
if (!s->create_type) {
s->create_type = g_strdup("monolithicSparse");
}
if (le64_to_cpu(header.gd_offset) == VMDK4_GD_AT_END) {
/*
* The footer takes precedence over the header, so read it in. The
@ -598,14 +626,6 @@ static int vmdk_open_vmdk4(BlockDriverState *bs,
}
l1_size = (le64_to_cpu(header.capacity) + l1_entry_sectors - 1)
/ l1_entry_sectors;
if (l1_size > 512 * 1024 * 1024) {
/* although with big capacity and small l1_entry_sectors, we can get a
* big l1_size, we don't want unbounded value to allocate the table.
* Limit it to 512M, which is 16PB for default cluster and L2 table
* size */
error_report("L1 size too big");
return -EFBIG;
}
if (le32_to_cpu(header.flags) & VMDK4_FLAG_RGD) {
l1_backup_offset = le64_to_cpu(header.rgd_offset) << 9;
}
@ -616,7 +636,8 @@ static int vmdk_open_vmdk4(BlockDriverState *bs,
l1_size,
le32_to_cpu(header.num_gtes_per_gt),
le64_to_cpu(header.granularity),
&extent);
&extent,
errp);
if (ret < 0) {
return ret;
}
@ -625,7 +646,7 @@ static int vmdk_open_vmdk4(BlockDriverState *bs,
extent->has_marker = le32_to_cpu(header.flags) & VMDK4_FLAG_MARKER;
extent->version = le32_to_cpu(header.version);
extent->has_zero_grain = le32_to_cpu(header.flags) & VMDK4_FLAG_ZERO_GRAIN;
ret = vmdk_init_tables(bs, extent);
ret = vmdk_init_tables(bs, extent, errp);
if (ret) {
/* free extent allocated by vmdk_add_extent */
vmdk_free_last_extent(bs);
@ -663,7 +684,7 @@ static int vmdk_parse_description(const char *desc, const char *opt_name,
/* Open an extent file and append to bs array */
static int vmdk_open_sparse(BlockDriverState *bs,
BlockDriverState *file,
int flags)
int flags, Error **errp)
{
uint32_t magic;
@ -674,10 +695,10 @@ static int vmdk_open_sparse(BlockDriverState *bs,
magic = be32_to_cpu(magic);
switch (magic) {
case VMDK3_MAGIC:
return vmdk_open_vmdk3(bs, file, flags);
return vmdk_open_vmfs_sparse(bs, file, flags, errp);
break;
case VMDK4_MAGIC:
return vmdk_open_vmdk4(bs, file, flags);
return vmdk_open_vmdk4(bs, file, flags, errp);
break;
default:
return -EMEDIUMTYPE;
@ -686,7 +707,7 @@ static int vmdk_open_sparse(BlockDriverState *bs,
}
static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
const char *desc_file_path)
const char *desc_file_path, Error **errp)
{
int ret;
char access[11];
@ -697,6 +718,8 @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
int64_t flat_offset;
char extent_path[PATH_MAX];
BlockDriverState *extent_file;
BDRVVmdkState *s = bs->opaque;
VmdkExtent *extent;
while (*p) {
/* parse extent line:
@ -711,60 +734,69 @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
goto next_line;
} else if (!strcmp(type, "FLAT")) {
if (ret != 5 || flat_offset < 0) {
error_setg(errp, "Invalid extent lines: \n%s", p);
return -EINVAL;
}
} else if (!strcmp(type, "VMFS")) {
flat_offset = 0;
} else if (ret != 4) {
error_setg(errp, "Invalid extent lines: \n%s", p);
return -EINVAL;
}
if (sectors <= 0 ||
(strcmp(type, "FLAT") && strcmp(type, "SPARSE")) ||
(strcmp(type, "FLAT") && strcmp(type, "SPARSE") &&
strcmp(type, "VMFS") && strcmp(type, "VMFSSPARSE")) ||
(strcmp(access, "RW"))) {
goto next_line;
}
path_combine(extent_path, sizeof(extent_path),
desc_file_path, fname);
ret = bdrv_file_open(&extent_file, extent_path, NULL, bs->open_flags);
ret = bdrv_file_open(&extent_file, extent_path, NULL, bs->open_flags,
errp);
if (ret) {
return ret;
}
/* save to extents array */
if (!strcmp(type, "FLAT")) {
if (!strcmp(type, "FLAT") || !strcmp(type, "VMFS")) {
/* FLAT extent */
VmdkExtent *extent;
ret = vmdk_add_extent(bs, extent_file, true, sectors,
0, 0, 0, 0, sectors, &extent);
0, 0, 0, 0, 0, &extent, errp);
if (ret < 0) {
return ret;
}
extent->flat_start_offset = flat_offset << 9;
} else if (!strcmp(type, "SPARSE")) {
/* SPARSE extent */
ret = vmdk_open_sparse(bs, extent_file, bs->open_flags);
} else if (!strcmp(type, "SPARSE") || !strcmp(type, "VMFSSPARSE")) {
/* SPARSE extent and VMFSSPARSE extent are both "COWD" sparse file*/
ret = vmdk_open_sparse(bs, extent_file, bs->open_flags, errp);
if (ret) {
bdrv_delete(extent_file);
bdrv_unref(extent_file);
return ret;
}
extent = &s->extents[s->num_extents - 1];
} else {
fprintf(stderr,
"VMDK: Not supported extent type \"%s\""".\n", type);
error_setg(errp, "Unsupported extent type '%s'", type);
return -ENOTSUP;
}
extent->type = g_strdup(type);
next_line:
/* move to next line */
while (*p && *p != '\n') {
while (*p) {
if (*p == '\n') {
p++;
break;
}
p++;
}
p++;
}
return 0;
}
static int vmdk_open_desc_file(BlockDriverState *bs, int flags,
uint64_t desc_offset)
uint64_t desc_offset, Error **errp)
{
int ret;
char *buf = NULL;
@ -789,29 +821,32 @@ static int vmdk_open_desc_file(BlockDriverState *bs, int flags,
goto exit;
}
if (strcmp(ct, "monolithicFlat") &&
strcmp(ct, "vmfs") &&
strcmp(ct, "vmfsSparse") &&
strcmp(ct, "twoGbMaxExtentSparse") &&
strcmp(ct, "twoGbMaxExtentFlat")) {
fprintf(stderr,
"VMDK: Not supported image type \"%s\""".\n", ct);
error_setg(errp, "Unsupported image type '%s'", ct);
ret = -ENOTSUP;
goto exit;
}
s->create_type = g_strdup(ct);
s->desc_offset = 0;
ret = vmdk_parse_extents(buf, bs, bs->file->filename);
ret = vmdk_parse_extents(buf, bs, bs->file->filename, errp);
exit:
g_free(buf);
return ret;
}
static int vmdk_open(BlockDriverState *bs, QDict *options, int flags)
static int vmdk_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
int ret;
BDRVVmdkState *s = bs->opaque;
if (vmdk_open_sparse(bs, bs->file, flags) == 0) {
if (vmdk_open_sparse(bs, bs->file, flags, errp) == 0) {
s->desc_offset = 0x200;
} else {
ret = vmdk_open_desc_file(bs, flags, 0);
ret = vmdk_open_desc_file(bs, flags, 0, errp);
if (ret) {
goto fail;
}
@ -821,6 +856,7 @@ static int vmdk_open(BlockDriverState *bs, QDict *options, int flags)
if (ret) {
goto fail;
}
s->cid = vmdk_read_cid(bs, 0);
s->parent_cid = vmdk_read_cid(bs, 1);
qemu_co_mutex_init(&s->lock);
@ -833,6 +869,8 @@ static int vmdk_open(BlockDriverState *bs, QDict *options, int flags)
return 0;
fail:
g_free(s->create_type);
s->create_type = NULL;
vmdk_free_extents(bs);
return ret;
}
@ -1039,7 +1077,7 @@ static VmdkExtent *find_extent(BDRVVmdkState *s,
return NULL;
}
static int coroutine_fn vmdk_co_is_allocated(BlockDriverState *bs,
static int64_t coroutine_fn vmdk_co_get_block_status(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, int *pnum)
{
BDRVVmdkState *s = bs->opaque;
@ -1056,7 +1094,24 @@ static int coroutine_fn vmdk_co_is_allocated(BlockDriverState *bs,
sector_num * 512, 0, &offset);
qemu_co_mutex_unlock(&s->lock);
ret = (ret == VMDK_OK || ret == VMDK_ZEROED);
switch (ret) {
case VMDK_ERROR:
ret = -EIO;
break;
case VMDK_UNALLOC:
ret = 0;
break;
case VMDK_ZEROED:
ret = BDRV_BLOCK_ZERO;
break;
case VMDK_OK:
ret = BDRV_BLOCK_DATA;
if (extent->file == bs->file) {
ret |= BDRV_BLOCK_OFFSET_VALID | offset;
}
break;
}
index_in_cluster = sector_num % extent->cluster_sectors;
n = extent->cluster_sectors - index_in_cluster;
@ -1261,8 +1316,7 @@ static int vmdk_write(BlockDriverState *bs, int64_t sector_num,
VmdkMetaData m_data;
if (sector_num > bs->total_sectors) {
fprintf(stderr,
"(VMDK) Wrong offset: sector_num=0x%" PRIx64
error_report("Wrong offset: sector_num=0x%" PRIx64
" total_sectors=0x%" PRIx64 "\n",
sector_num, bs->total_sectors);
return -EIO;
@ -1282,9 +1336,8 @@ static int vmdk_write(BlockDriverState *bs, int64_t sector_num,
if (extent->compressed) {
if (ret == VMDK_OK) {
/* Refuse write to allocated cluster for streamOptimized */
fprintf(stderr,
"VMDK: can't write to allocated cluster"
" for streamOptimized\n");
error_report("Could not write to allocated cluster"
" for streamOptimized");
return -EIO;
} else {
/* allocate */
@ -1381,7 +1434,6 @@ static int coroutine_fn vmdk_co_write_zeroes(BlockDriverState *bs,
return ret;
}
static int vmdk_create_extent(const char *filename, int64_t filesize,
bool flat, bool compress, bool zeroed_grain)
{
@ -1493,12 +1545,12 @@ static int vmdk_create_extent(const char *filename, int64_t filesize,
}
static int filename_decompose(const char *filename, char *path, char *prefix,
char *postfix, size_t buf_len)
char *postfix, size_t buf_len, Error **errp)
{
const char *p, *q;
if (filename == NULL || !strlen(filename)) {
fprintf(stderr, "Vmdk: no filename provided.\n");
error_setg(errp, "No filename provided");
return VMDK_ERROR;
}
p = strrchr(filename, '/');
@ -1532,7 +1584,8 @@ static int filename_decompose(const char *filename, char *path, char *prefix,
return VMDK_OK;
}
static int vmdk_create(const char *filename, QEMUOptionParameter *options)
static int vmdk_create(const char *filename, QEMUOptionParameter *options,
Error **errp)
{
int fd, idx = 0;
char desc[BUF_SIZE];
@ -1571,7 +1624,7 @@ static int vmdk_create(const char *filename, QEMUOptionParameter *options)
"ddb.geometry.sectors = \"63\"\n"
"ddb.adapterType = \"%s\"\n";
if (filename_decompose(filename, path, prefix, postfix, PATH_MAX)) {
if (filename_decompose(filename, path, prefix, postfix, PATH_MAX, errp)) {
return -EINVAL;
}
/* Read out options */
@ -1597,7 +1650,7 @@ static int vmdk_create(const char *filename, QEMUOptionParameter *options)
strcmp(adapter_type, "buslogic") &&
strcmp(adapter_type, "lsilogic") &&
strcmp(adapter_type, "legacyESX")) {
fprintf(stderr, "VMDK: Unknown adapter type: '%s'.\n", adapter_type);
error_setg(errp, "Unknown adapter type: '%s'", adapter_type);
return -EINVAL;
}
if (strcmp(adapter_type, "ide") != 0) {
@ -1613,7 +1666,7 @@ static int vmdk_create(const char *filename, QEMUOptionParameter *options)
strcmp(fmt, "twoGbMaxExtentSparse") &&
strcmp(fmt, "twoGbMaxExtentFlat") &&
strcmp(fmt, "streamOptimized")) {
fprintf(stderr, "VMDK: Unknown subformat: %s\n", fmt);
error_setg(errp, "Unknown subformat: '%s'", fmt);
return -EINVAL;
}
split = !(strcmp(fmt, "twoGbMaxExtentFlat") &&
@ -1627,22 +1680,26 @@ static int vmdk_create(const char *filename, QEMUOptionParameter *options)
desc_extent_line = "RW %lld SPARSE \"%s\"\n";
}
if (flat && backing_file) {
/* not supporting backing file for flat image */
error_setg(errp, "Flat image can't have backing file");
return -ENOTSUP;
}
if (flat && zeroed_grain) {
error_setg(errp, "Flat image can't enable zeroed grain");
return -ENOTSUP;
}
if (backing_file) {
BlockDriverState *bs = bdrv_new("");
ret = bdrv_open(bs, backing_file, NULL, 0, NULL);
ret = bdrv_open(bs, backing_file, NULL, 0, NULL, errp);
if (ret != 0) {
bdrv_delete(bs);
bdrv_unref(bs);
return ret;
}
if (strcmp(bs->drv->format_name, "vmdk")) {
bdrv_delete(bs);
bdrv_unref(bs);
return -EINVAL;
}
parent_cid = vmdk_read_cid(bs, 0);
bdrv_delete(bs);
bdrv_unref(bs);
snprintf(parent_desc_line, sizeof(parent_desc_line),
"parentFileNameHint=\"%s\"", backing_file);
}
@ -1725,6 +1782,7 @@ static void vmdk_close(BlockDriverState *bs)
BDRVVmdkState *s = bs->opaque;
vmdk_free_extents(bs);
g_free(s->create_type);
migrate_del_blocker(s->migration_blocker);
error_free(s->migration_blocker);
@ -1786,6 +1844,54 @@ static int vmdk_has_zero_init(BlockDriverState *bs)
return 1;
}
static ImageInfo *vmdk_get_extent_info(VmdkExtent *extent)
{
ImageInfo *info = g_new0(ImageInfo, 1);
*info = (ImageInfo){
.filename = g_strdup(extent->file->filename),
.format = g_strdup(extent->type),
.virtual_size = extent->sectors * BDRV_SECTOR_SIZE,
.compressed = extent->compressed,
.has_compressed = extent->compressed,
.cluster_size = extent->cluster_sectors * BDRV_SECTOR_SIZE,
.has_cluster_size = !extent->flat,
};
return info;
}
static ImageInfoSpecific *vmdk_get_specific_info(BlockDriverState *bs)
{
int i;
BDRVVmdkState *s = bs->opaque;
ImageInfoSpecific *spec_info = g_new0(ImageInfoSpecific, 1);
ImageInfoList **next;
*spec_info = (ImageInfoSpecific){
.kind = IMAGE_INFO_SPECIFIC_KIND_VMDK,
{
.vmdk = g_new0(ImageInfoSpecificVmdk, 1),
},
};
*spec_info->vmdk = (ImageInfoSpecificVmdk) {
.create_type = g_strdup(s->create_type),
.cid = s->cid,
.parent_cid = s->parent_cid,
};
next = &spec_info->vmdk->extents;
for (i = 0; i < s->num_extents; i++) {
*next = g_new0(ImageInfoList, 1);
(*next)->value = vmdk_get_extent_info(&s->extents[i]);
(*next)->next = NULL;
next = &(*next)->next;
}
return spec_info;
}
static QEMUOptionParameter vmdk_create_options[] = {
{
.name = BLOCK_OPT_SIZE,
@ -1835,9 +1941,10 @@ static BlockDriver bdrv_vmdk = {
.bdrv_close = vmdk_close,
.bdrv_create = vmdk_create,
.bdrv_co_flush_to_disk = vmdk_co_flush,
.bdrv_co_is_allocated = vmdk_co_is_allocated,
.bdrv_co_get_block_status = vmdk_co_get_block_status,
.bdrv_get_allocated_file_size = vmdk_get_allocated_file_size,
.bdrv_has_zero_init = vmdk_has_zero_init,
.bdrv_get_specific_info = vmdk_get_specific_info,
.create_options = vmdk_create_options,
};

View File

@ -46,7 +46,7 @@ enum vhd_type {
#define VHD_TIMESTAMP_BASE 946684800
// always big-endian
struct vhd_footer {
typedef struct vhd_footer {
char creator[8]; // "conectix"
uint32_t features;
uint32_t version;
@ -79,9 +79,9 @@ struct vhd_footer {
uint8_t uuid[16];
uint8_t in_saved_state;
};
} QEMU_PACKED VHDFooter;
struct vhd_dyndisk_header {
typedef struct vhd_dyndisk_header {
char magic[8]; // "cxsparse"
// Offset of next header structure, 0xFFFFFFFF if none
@ -111,7 +111,7 @@ struct vhd_dyndisk_header {
uint32_t reserved;
uint64_t data_offset;
} parent_locator[8];
};
} QEMU_PACKED VHDDynDiskHeader;
typedef struct BDRVVPCState {
CoMutex lock;
@ -155,12 +155,13 @@ static int vpc_probe(const uint8_t *buf, int buf_size, const char *filename)
return 0;
}
static int vpc_open(BlockDriverState *bs, QDict *options, int flags)
static int vpc_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVVPCState *s = bs->opaque;
int i;
struct vhd_footer* footer;
struct vhd_dyndisk_header* dyndisk_header;
VHDFooter *footer;
VHDDynDiskHeader *dyndisk_header;
uint8_t buf[HEADER_SIZE];
uint32_t checksum;
int disk_type = VHD_DYNAMIC;
@ -171,7 +172,7 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags)
goto fail;
}
footer = (struct vhd_footer*) s->footer_buf;
footer = (VHDFooter *) s->footer_buf;
if (strncmp(footer->creator, "conectix", 8)) {
int64_t offset = bdrv_getlength(bs->file);
if (offset < 0) {
@ -210,6 +211,15 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags)
bs->total_sectors = (int64_t)
be16_to_cpu(footer->cyls) * footer->heads * footer->secs_per_cyl;
/* images created with disk2vhd report a far higher virtual size
* than expected with the cyls * heads * sectors_per_cyl formula.
* use the footer->size instead if the image was created with
* disk2vhd.
*/
if (!strncmp(footer->creator_app, "d2v", 4)) {
bs->total_sectors = be64_to_cpu(footer->size) / BDRV_SECTOR_SIZE;
}
/* Allow a maximum disk size of approximately 2 TB */
if (bs->total_sectors >= 65535LL * 255 * 255) {
ret = -EFBIG;
@ -223,7 +233,7 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags)
goto fail;
}
dyndisk_header = (struct vhd_dyndisk_header *) buf;
dyndisk_header = (VHDDynDiskHeader *) buf;
if (strncmp(dyndisk_header->magic, "cxsparse", 8)) {
ret = -EINVAL;
@ -259,6 +269,13 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags)
}
}
if (s->free_data_block_offset > bdrv_getlength(bs->file)) {
error_setg(errp, "block-vpc: free_data_block_offset points after "
"the end of file. The image has been truncated.");
ret = -EINVAL;
goto fail;
}
s->last_bitmap_offset = (int64_t) -1;
#ifdef CACHE
@ -445,7 +462,7 @@ static int vpc_read(BlockDriverState *bs, int64_t sector_num,
int ret;
int64_t offset;
int64_t sectors, sectors_per_block;
struct vhd_footer *footer = (struct vhd_footer *) s->footer_buf;
VHDFooter *footer = (VHDFooter *) s->footer_buf;
if (cpu_to_be32(footer->type) == VHD_FIXED) {
return bdrv_read(bs->file, sector_num, buf, nb_sectors);
@ -494,7 +511,7 @@ static int vpc_write(BlockDriverState *bs, int64_t sector_num,
int64_t offset;
int64_t sectors, sectors_per_block;
int ret;
struct vhd_footer *footer = (struct vhd_footer *) s->footer_buf;
VHDFooter *footer = (VHDFooter *) s->footer_buf;
if (cpu_to_be32(footer->type) == VHD_FIXED) {
return bdrv_write(bs->file, sector_num, buf, nb_sectors);
@ -596,8 +613,8 @@ static int calculate_geometry(int64_t total_sectors, uint16_t* cyls,
static int create_dynamic_disk(int fd, uint8_t *buf, int64_t total_sectors)
{
struct vhd_dyndisk_header* dyndisk_header =
(struct vhd_dyndisk_header*) buf;
VHDDynDiskHeader *dyndisk_header =
(VHDDynDiskHeader *) buf;
size_t block_size, num_bat_entries;
int i;
int ret = -EIO;
@ -683,10 +700,11 @@ static int create_fixed_disk(int fd, uint8_t *buf, int64_t total_size)
return ret;
}
static int vpc_create(const char *filename, QEMUOptionParameter *options)
static int vpc_create(const char *filename, QEMUOptionParameter *options,
Error **errp)
{
uint8_t buf[1024];
struct vhd_footer *footer = (struct vhd_footer *) buf;
VHDFooter *footer = (VHDFooter *) buf;
QEMUOptionParameter *disk_type_param;
int fd, i;
uint16_t cyls = 0;
@ -789,7 +807,7 @@ static int vpc_create(const char *filename, QEMUOptionParameter *options)
static int vpc_has_zero_init(BlockDriverState *bs)
{
BDRVVPCState *s = bs->opaque;
struct vhd_footer *footer = (struct vhd_footer *) s->footer_buf;
VHDFooter *footer = (VHDFooter *) s->footer_buf;
if (cpu_to_be32(footer->type) == VHD_FIXED) {
return bdrv_has_zero_init(bs->file);

View File

@ -1065,7 +1065,8 @@ static void vvfat_parse_filename(const char *filename, QDict *options,
qdict_put(options, "rw", qbool_from_int(rw));
}
static int vvfat_open(BlockDriverState *bs, QDict *options, int flags)
static int vvfat_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVVVFATState *s = bs->opaque;
int cyls, heads, secs;
@ -2874,16 +2875,17 @@ static coroutine_fn int vvfat_co_write(BlockDriverState *bs, int64_t sector_num,
return ret;
}
static int coroutine_fn vvfat_co_is_allocated(BlockDriverState *bs,
static int64_t coroutine_fn vvfat_co_get_block_status(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, int* n)
{
BDRVVVFATState* s = bs->opaque;
*n = s->sector_count - sector_num;
if (*n > nb_sectors)
*n = nb_sectors;
else if (*n < 0)
return 0;
return 1;
if (*n > nb_sectors) {
*n = nb_sectors;
} else if (*n < 0) {
return 0;
}
return BDRV_BLOCK_DATA;
}
static int write_target_commit(BlockDriverState *bs, int64_t sector_num,
@ -2894,7 +2896,7 @@ static int write_target_commit(BlockDriverState *bs, int64_t sector_num,
static void write_target_close(BlockDriverState *bs) {
BDRVVVFATState* s = *((BDRVVVFATState**) bs->opaque);
bdrv_delete(s->qcow);
bdrv_unref(s->qcow);
g_free(s->qcow_filename);
}
@ -2908,6 +2910,7 @@ static int enable_write_target(BDRVVVFATState *s)
{
BlockDriver *bdrv_qcow;
QEMUOptionParameter *options;
Error *local_err = NULL;
int ret;
int size = sector2cluster(s, s->sector_count);
s->used_clusters = calloc(size, 1);
@ -2925,17 +2928,22 @@ static int enable_write_target(BDRVVVFATState *s)
set_option_parameter_int(options, BLOCK_OPT_SIZE, s->sector_count * 512);
set_option_parameter(options, BLOCK_OPT_BACKING_FILE, "fat:");
ret = bdrv_create(bdrv_qcow, s->qcow_filename, options);
ret = bdrv_create(bdrv_qcow, s->qcow_filename, options, &local_err);
if (ret < 0) {
qerror_report_err(local_err);
error_free(local_err);
goto err;
}
s->qcow = bdrv_new("");
ret = bdrv_open(s->qcow, s->qcow_filename, NULL,
BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH, bdrv_qcow);
BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH, bdrv_qcow,
&local_err);
if (ret < 0) {
bdrv_delete(s->qcow);
qerror_report_err(local_err);
error_free(local_err);
bdrv_unref(s->qcow);
goto err;
}
@ -2943,7 +2951,7 @@ static int enable_write_target(BDRVVVFATState *s)
unlink(s->qcow_filename);
#endif
s->bs->backing_hd = calloc(sizeof(BlockDriverState), 1);
s->bs->backing_hd = bdrv_new("");
s->bs->backing_hd->drv = &vvfat_write_target;
s->bs->backing_hd->opaque = g_malloc(sizeof(void*));
*(void**)s->bs->backing_hd->opaque = s;
@ -2984,7 +2992,7 @@ static BlockDriver bdrv_vvfat = {
.bdrv_read = vvfat_co_read,
.bdrv_write = vvfat_co_write,
.bdrv_co_is_allocated = vvfat_co_is_allocated,
.bdrv_co_get_block_status = vvfat_co_get_block_status,
};
static void bdrv_vvfat_init(void)

View File

@ -105,13 +105,6 @@ static void win32_aio_completion_cb(EventNotifier *e)
}
}
static int win32_aio_flush_cb(EventNotifier *e)
{
QEMUWin32AIOState *s = container_of(e, QEMUWin32AIOState, e);
return (s->count > 0) ? 1 : 0;
}
static void win32_aio_cancel(BlockDriverAIOCB *blockacb)
{
QEMUWin32AIOCB *waiocb = (QEMUWin32AIOCB *)blockacb;
@ -201,8 +194,7 @@ QEMUWin32AIOState *win32_aio_init(void)
goto out_close_efd;
}
qemu_aio_set_event_notifier(&s->e, win32_aio_completion_cb,
win32_aio_flush_cb);
qemu_aio_set_event_notifier(&s->e, win32_aio_completion_cb);
return s;

View File

@ -69,12 +69,6 @@ static void nbd_close_notifier(Notifier *n, void *data)
g_free(cn);
}
static void nbd_server_put_ref(NBDExport *exp)
{
BlockDriverState *bs = nbd_export_get_blockdev(exp);
drive_put_ref(drive_get_by_blockdev(bs));
}
void qmp_nbd_server_add(const char *device, bool has_writable, bool writable,
Error **errp)
{
@ -105,11 +99,9 @@ void qmp_nbd_server_add(const char *device, bool has_writable, bool writable,
writable = false;
}
exp = nbd_export_new(bs, 0, -1, writable ? 0 : NBD_FLAG_READ_ONLY,
nbd_server_put_ref);
exp = nbd_export_new(bs, 0, -1, writable ? 0 : NBD_FLAG_READ_ONLY, NULL);
nbd_export_set_name(exp, device);
drive_get_ref(drive_get_by_blockdev(bs));
n = g_malloc0(sizeof(NBDCloseNotifier));
n->n.notify = nbd_close_notifier;

1287
blockdev.c

File diff suppressed because it is too large Load Diff

View File

@ -35,7 +35,7 @@
#include "qmp-commands.h"
#include "qemu/timer.h"
void *block_job_create(const BlockJobType *job_type, BlockDriverState *bs,
void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
int64_t speed, BlockDriverCompletionFunc *cb,
void *opaque, Error **errp)
{
@ -45,10 +45,11 @@ void *block_job_create(const BlockJobType *job_type, BlockDriverState *bs,
error_set(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bs));
return NULL;
}
bdrv_ref(bs);
bdrv_set_in_use(bs, 1);
job = g_malloc0(job_type->instance_size);
job->job_type = job_type;
job = g_malloc0(driver->instance_size);
job->driver = driver;
job->bs = bs;
job->cb = cb;
job->opaque = opaque;
@ -86,11 +87,11 @@ void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
{
Error *local_err = NULL;
if (!job->job_type->set_speed) {
if (!job->driver->set_speed) {
error_set(errp, QERR_NOT_SUPPORTED);
return;
}
job->job_type->set_speed(job, speed, &local_err);
job->driver->set_speed(job, speed, &local_err);
if (error_is_set(&local_err)) {
error_propagate(errp, local_err);
return;
@ -101,12 +102,12 @@ void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
void block_job_complete(BlockJob *job, Error **errp)
{
if (job->paused || job->cancelled || !job->job_type->complete) {
if (job->paused || job->cancelled || !job->driver->complete) {
error_set(errp, QERR_BLOCK_JOB_NOT_READY, job->bs->device_name);
return;
}
job->job_type->complete(job, errp);
job->driver->complete(job, errp);
}
void block_job_pause(BlockJob *job)
@ -142,8 +143,8 @@ bool block_job_is_cancelled(BlockJob *job)
void block_job_iostatus_reset(BlockJob *job)
{
job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
if (job->job_type->iostatus_reset) {
job->job_type->iostatus_reset(job);
if (job->driver->iostatus_reset) {
job->driver->iostatus_reset(job);
}
}
@ -187,7 +188,7 @@ int block_job_cancel_sync(BlockJob *job)
return (data.cancelled && data.ret == 0) ? -ECANCELED : data.ret;
}
void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns)
void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
{
assert(job->busy);
@ -200,7 +201,7 @@ void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns)
if (block_job_is_paused(job)) {
qemu_coroutine_yield();
} else {
co_sleep_ns(clock, ns);
co_sleep_ns(type, ns);
}
job->busy = true;
}
@ -208,7 +209,7 @@ void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns)
BlockJobInfo *block_job_query(BlockJob *job)
{
BlockJobInfo *info = g_new0(BlockJobInfo, 1);
info->type = g_strdup(job->job_type->job_type);
info->type = g_strdup(BlockJobType_lookup[job->driver->job_type]);
info->device = g_strdup(bdrv_get_device_name(job->bs));
info->len = job->len;
info->busy = job->busy;
@ -235,7 +236,7 @@ QObject *qobject_from_block_job(BlockJob *job)
"'len': %" PRId64 ","
"'offset': %" PRId64 ","
"'speed': %" PRId64 " }",
job->job_type->job_type,
BlockJobType_lookup[job->driver->job_type],
bdrv_get_device_name(job->bs),
job->len,
job->offset,

View File

@ -323,9 +323,9 @@ abi_long copy_from_user(void *hptr, abi_ulong gaddr, size_t len);
abi_long copy_to_user(abi_ulong gaddr, void *hptr, size_t len);
/* Functions for accessing guest memory. The tget and tput functions
read/write single values, byteswapping as necessary. The lock_user
read/write single values, byteswapping as necessary. The lock_user function
gets a pointer to a contiguous area of guest memory, but does not perform
and byteswapping. lock_user may return either a pointer to the guest
any byteswapping. lock_user may return either a pointer to the guest
memory, or a temporary buffer. */
/* Lock an area of guest memory into the host. If copy is true then the
@ -381,7 +381,7 @@ static inline void *lock_user_string(abi_ulong guest_addr)
return lock_user(VERIFY_READ, guest_addr, (long)(len + 1), 1);
}
/* Helper macros for locking/ulocking a target struct. */
/* Helper macros for locking/unlocking a target struct. */
#define lock_user_struct(type, host_ptr, guest_addr, copy) \
(host_ptr = lock_user(type, guest_addr, sizeof(*host_ptr), copy))
#define unlock_user_struct(host_ptr, guest_addr, copy) \

416
configure vendored
View File

@ -27,6 +27,19 @@ printf " '%s'" "$0" "$@" >> config.log
echo >> config.log
echo "#" >> config.log
# Save the configure command line for later reuse.
cat <<EOD >config.status
#!/bin/sh
# Generated by configure.
# Run this file to recreate the current configuration.
# Compiler output produced by configure, useful for debugging
# configure, is in config.log if it exists.
EOD
printf "exec" >>config.status
printf " '%s'" "$0" "$@" >>config.status
echo >>config.status
chmod +x config.status
error_exit() {
echo
echo "ERROR: $1"
@ -119,6 +132,7 @@ path_of() {
# default parameters
source_path=`dirname "$0"`
cpu=""
iasl="iasl"
interp_prefix="/usr/gnemul/qemu-%M"
static="no"
cross_prefix=""
@ -215,7 +229,6 @@ linux_user="no"
bsd_user="no"
guest_base="yes"
uname_release=""
mixemu="no"
aix="no"
blobs="yes"
pkgversion=""
@ -232,9 +245,13 @@ usb_redir=""
opengl=""
zlib="yes"
guest_agent=""
guest_agent_with_vss="no"
vss_win32_sdk=""
win_sdk="no"
want_tools="yes"
libiscsi=""
coroutine=""
coroutine_pool=""
seccomp=""
glusterfs=""
glusterfs_discard="no"
@ -243,6 +260,7 @@ gtk=""
gtkabi="2.0"
tpm="no"
libssh2=""
vhdx=""
# parse CC options first
for opt do
@ -252,6 +270,8 @@ for opt do
;;
--cc=*) CC="$optarg"
;;
--cxx=*) CXX="$optarg"
;;
--source-path=*) source_path="$optarg"
;;
--cpu=*) cpu="$optarg"
@ -282,6 +302,12 @@ else
cc="${CC-${cross_prefix}gcc}"
fi
if test -z "${CXX}${cross_prefix}"; then
cxx="c++"
else
cxx="${CXX-${cross_prefix}g++}"
fi
ar="${AR-${cross_prefix}ar}"
as="${AS-${cross_prefix}as}"
cpp="${CPP-$cc -E}"
@ -297,6 +323,9 @@ query_pkg_config() {
pkg_config=query_pkg_config
sdl_config="${SDL_CONFIG-${cross_prefix}sdl-config}"
# If the user hasn't specified ARFLAGS, default to 'rv', just as make does.
ARFLAGS="${ARFLAGS-rv}"
# default flags for all hosts
QEMU_CFLAGS="-fno-strict-aliasing $QEMU_CFLAGS"
QEMU_CFLAGS="-Wall -Wundef -Wwrite-strings -Wmissing-prototypes $QEMU_CFLAGS"
@ -362,7 +391,11 @@ if test ! -z "$cpu" ; then
elif check_define __i386__ ; then
cpu="i386"
elif check_define __x86_64__ ; then
cpu="x86_64"
if check_define __ILP32__ ; then
cpu="x32"
else
cpu="x86_64"
fi
elif check_define __sparc__ ; then
if check_define __arch64__ ; then
cpu="sparc64"
@ -399,7 +432,7 @@ ARCH=
# Normalise host CPU name and set ARCH.
# Note that this case should only have supported host CPUs, not guests.
case "$cpu" in
ia64|ppc|ppc64|s390|s390x|sparc64)
ia64|ppc|ppc64|s390|s390x|sparc64|x32)
cpu="$cpu"
;;
i386|i486|i586|i686|i86pc|BePC)
@ -414,9 +447,6 @@ case "$cpu" in
aarch64)
cpu="aarch64"
;;
hppa|parisc|parisc64)
cpu="hppa"
;;
mips*)
cpu="mips"
;;
@ -546,11 +576,10 @@ Haiku)
audio_possible_drivers="oss alsa sdl esd pa"
linux="yes"
linux_user="yes"
usb="linux"
kvm="yes"
vhost_net="yes"
vhost_scsi="yes"
if [ "$cpu" = "i386" -o "$cpu" = "x86_64" ] ; then
if [ "$cpu" = "i386" -o "$cpu" = "x86_64" -o "$cpu" = "x32" ] ; then
audio_possible_drivers="$audio_possible_drivers fmod"
fi
QEMU_INCLUDES="-I\$(SRC_PATH)/linux-headers -I$(pwd)/linux-headers $QEMU_INCLUDES"
@ -559,9 +588,6 @@ esac
if [ "$bsd" = "yes" ] ; then
if [ "$darwin" != "yes" ] ; then
if [ "$targetos" != "FreeBSD" ]; then
usb="bsd"
fi
bsd_user="yes"
fi
fi
@ -622,6 +648,10 @@ for opt do
;;
--host-cc=*) host_cc="$optarg"
;;
--cxx=*)
;;
--iasl=*) iasl="$optarg"
;;
--objcc=*) objcc="$optarg"
;;
--make=*) make="$optarg"
@ -855,8 +885,6 @@ for opt do
;;
--enable-fdt) fdt="yes"
;;
--enable-mixemu) mixemu="yes"
;;
--disable-linux-aio) linux_aio="no"
;;
--enable-linux-aio) linux_aio="yes"
@ -871,6 +899,10 @@ for opt do
;;
--with-coroutine=*) coroutine="$optarg"
;;
--disable-coroutine-pool) coroutine_pool="no"
;;
--enable-coroutine-pool) coroutine_pool="yes"
;;
--disable-docs) docs="no"
;;
--enable-docs) docs="yes"
@ -913,6 +945,18 @@ for opt do
;;
--disable-guest-agent) guest_agent="no"
;;
--with-vss-sdk) vss_win32_sdk=""
;;
--with-vss-sdk=*) vss_win32_sdk="$optarg"
;;
--without-vss-sdk) vss_win32_sdk="no"
;;
--with-win-sdk) win_sdk=""
;;
--with-win-sdk=*) win_sdk="$optarg"
;;
--without-win-sdk) win_sdk="no"
;;
--enable-tools) want_tools="yes"
;;
--disable-tools) want_tools="no"
@ -945,12 +989,24 @@ for opt do
;;
--enable-libssh2) libssh2="yes"
;;
--enable-vhdx) vhdx="yes"
;;
--disable-vhdx) vhdx="no"
;;
*) echo "ERROR: unknown option $opt"; show_help="yes"
;;
esac
done
case "$cpu" in
ppc)
CPU_CFLAGS="-m32"
LDFLAGS="-m32 $LDFLAGS"
;;
ppc64)
CPU_CFLAGS="-m64"
LDFLAGS="-m64 $LDFLAGS"
;;
sparc)
LDFLAGS="-m32 $LDFLAGS"
CPU_CFLAGS="-m32 -mcpu=ultrasparc"
@ -977,6 +1033,11 @@ case "$cpu" in
LDFLAGS="-m64 $LDFLAGS"
cc_i386='$(CC) -m32'
;;
x32)
CPU_CFLAGS="-mx32"
LDFLAGS="-mx32 $LDFLAGS"
cc_i386='$(CC) -m32'
;;
# No special flags required for other host CPUs
esac
@ -1021,8 +1082,10 @@ echo "Advanced options (experts only):"
echo " --source-path=PATH path of source code [$source_path]"
echo " --cross-prefix=PREFIX use PREFIX for compile tools [$cross_prefix]"
echo " --cc=CC use C compiler CC [$cc]"
echo " --iasl=IASL use ACPI compiler IASL [$iasl]"
echo " --host-cc=CC use C compiler CC [$host_cc] for code run at"
echo " build time"
echo " --cxx=CXX use C++ compiler CXX [$cxx]"
echo " --objcc=OBJCC use Objective-C compiler OBJCC [$objcc]"
echo " --extra-cflags=CFLAGS append extra C compiler flags QEMU_CFLAGS"
echo " --extra-ldflags=LDFLAGS append extra linker flags LDFLAGS"
@ -1067,7 +1130,6 @@ echo " (affects only QEMU, not qemu-img)"
echo " --block-drv-ro-whitelist=L"
echo " set block driver read-only whitelist"
echo " (affects only QEMU, not qemu-img)"
echo " --enable-mixemu enable mixer emulation"
echo " --disable-xen disable xen backend driver support"
echo " --enable-xen enable xen backend driver support"
echo " --disable-xen-pci-passthrough"
@ -1148,10 +1210,14 @@ echo " --disable-usb-redir disable usb network redirection support"
echo " --enable-usb-redir enable usb network redirection support"
echo " --disable-guest-agent disable building of the QEMU Guest Agent"
echo " --enable-guest-agent enable building of the QEMU Guest Agent"
echo " --with-vss-sdk=SDK-path enable Windows VSS support in QEMU Guest Agent"
echo " --with-win-sdk=SDK-path path to Windows Platform SDK (to build VSS .tlb)"
echo " --disable-seccomp disable seccomp support"
echo " --enable-seccomp enables seccomp support"
echo " --with-coroutine=BACKEND coroutine backend. Supported options:"
echo " gthread, ucontext, sigaltstack, windows"
echo " --disable-coroutine-pool disable coroutine freelist (worse performance)"
echo " --enable-coroutine-pool enable coroutine freelist (better performance)"
echo " --enable-glusterfs enable GlusterFS backend"
echo " --disable-glusterfs disable GlusterFS backend"
echo " --enable-gcov enable test coverage analysis with gcov"
@ -1159,6 +1225,8 @@ echo " --gcov=GCOV use specified gcov [$gcov_tool]"
echo " --enable-tpm enable TPM support"
echo " --disable-libssh2 disable ssh block device support"
echo " --enable-libssh2 enable ssh block device support"
echo " --disable-vhdx disables support for the Microsoft VHDX image format"
echo " --enable-vhdx enable support for the Microsoft VHDX image format"
echo ""
echo "NOTE: The object files are built at the place where configure is launched"
exit 1
@ -1204,6 +1272,7 @@ gcc_flags="-Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers $gcc_
gcc_flags="-Wmissing-include-dirs -Wempty-body -Wnested-externs $gcc_flags"
gcc_flags="-Wendif-labels $gcc_flags"
gcc_flags="-Wno-initializer-overrides $gcc_flags"
gcc_flags="-Wno-string-plus-int $gcc_flags"
# Note that we do not add -Werror to gcc_flags here, because that would
# enable it for all configure tests. If a configure test failed due
# to -Werror this would just silently disable some features,
@ -1251,7 +1320,7 @@ fi
if test "$pie" = ""; then
case "$cpu-$targetos" in
i386-Linux|x86_64-Linux|i386-OpenBSD|x86_64-OpenBSD)
i386-Linux|x86_64-Linux|x32-Linux|i386-OpenBSD|x86_64-OpenBSD)
;;
*)
pie="no"
@ -1348,12 +1417,19 @@ fi
# Note that if the Python conditional here evaluates True we will exit
# with status 1 which is a shell 'false' value.
if ! "$python" -c 'import sys; sys.exit(sys.version_info < (2,4) or sys.version_info >= (3,))'; then
if ! $python -c 'import sys; sys.exit(sys.version_info < (2,4) or sys.version_info >= (3,))'; then
error_exit "Cannot use '$python', Python 2.4 or later is required." \
"Note that Python 3 or later is not yet supported." \
"Use --python=/path/to/python to specify a supported Python."
fi
# The -B switch was added in Python 2.6.
# If it is supplied, compiled files are not written.
# Use it for Python versions which support it.
if $python -B -c 'import sys; sys.exit(0)' 2>/dev/null; then
python="$python -B"
fi
if test -z "${target_list+xxx}" ; then
target_list="$default_target_list"
else
@ -1387,39 +1463,27 @@ feature_not_found() {
"configure was not able to find it"
}
if test -z "$cross_prefix" ; then
# ---
# big/little endian test
cat > $TMPC << EOF
#include <inttypes.h>
int main(void) {
volatile uint32_t i=0x01234567;
return (*((uint8_t*)(&i))) == 0x67;
short big_endian[] = { 0x4269, 0x4765, 0x4e64, 0x4961, 0x4e00, 0, };
short little_endian[] = { 0x694c, 0x7454, 0x654c, 0x6e45, 0x6944, 0x6e41, 0, };
extern int foo(short *, short *);
int main(int argc, char *argv[]) {
return foo(big_endian, little_endian);
}
EOF
if compile_prog "" "" ; then
$TMPE && bigendian="yes"
else
echo big/little test failed
fi
else
# if cross compiling, cannot launch a program, so make a static guess
case "$cpu" in
arm)
# ARM can be either way; ask the compiler which one we are
if check_define __ARMEB__; then
bigendian=yes
if compile_object ; then
if grep -q BiGeNdIaN $TMPO ; then
bigendian="yes"
elif grep -q LiTtLeEnDiAn $TMPO ; then
bigendian="no"
else
echo big/little test failed
fi
;;
hppa|m68k|mips|mips64|ppc|ppc64|s390|s390x|sparc|sparc64)
bigendian=yes
;;
esac
else
echo big/little test failed
fi
##########################################
@ -1469,7 +1533,7 @@ libs_softmmu="$libs_softmmu -lz"
# libseccomp check
if test "$seccomp" != "no" ; then
if $pkg_config --atleast-version=2.1.0 libseccomp --modversion >/dev/null 2>&1; then
if $pkg_config --atleast-version=2.1.0 libseccomp; then
libs_softmmu="$libs_softmmu `$pkg_config --libs libseccomp`"
QEMU_CFLAGS="$QEMU_CFLAGS `$pkg_config --cflags libseccomp`"
seccomp="yes"
@ -1703,10 +1767,10 @@ if test "$gtk" != "no"; then
fi
gtk="no"
else
gtk_cflags=`$pkg_config --cflags $gtkpackage 2>/dev/null`
gtk_libs=`$pkg_config --libs $gtkpackage 2>/dev/null`
vte_cflags=`$pkg_config --cflags $vtepackage 2>/dev/null`
vte_libs=`$pkg_config --libs $vtepackage 2>/dev/null`
gtk_cflags=`$pkg_config --cflags $gtkpackage`
gtk_libs=`$pkg_config --libs $gtkpackage`
vte_cflags=`$pkg_config --cflags $vtepackage`
vte_libs=`$pkg_config --libs $vtepackage`
libs_softmmu="$gtk_libs $vte_libs $libs_softmmu"
gtk="yes"
fi
@ -1721,7 +1785,7 @@ if test "`basename $sdl_config`" != sdl-config && ! has ${sdl_config}; then
sdl_config=sdl-config
fi
if $pkg_config sdl --modversion >/dev/null 2>&1; then
if $pkg_config sdl --exists; then
sdlconfig="$pkg_config sdl"
_sdlversion=`$sdlconfig --modversion 2>/dev/null | sed 's/[^0-9]//g'`
elif has ${sdl_config}; then
@ -1907,9 +1971,9 @@ int main(void) {
return png_ptr != 0;
}
EOF
if $pkg_config libpng --modversion >/dev/null 2>&1; then
vnc_png_cflags=`$pkg_config libpng --cflags 2> /dev/null`
vnc_png_libs=`$pkg_config libpng --libs 2> /dev/null`
if $pkg_config libpng --exists; then
vnc_png_cflags=`$pkg_config libpng --cflags`
vnc_png_libs=`$pkg_config libpng --libs`
else
vnc_png_cflags=""
vnc_png_libs="-lpng"
@ -1970,6 +2034,18 @@ EOF
fi
fi
if test "$vhdx" = "yes" ; then
if test "$uuid" = "no" ; then
error_exit "uuid required for VHDX support"
fi
elif test "$vhdx" != "no" ; then
if test "$uuid" = "yes" ; then
vhdx=yes
else
vhdx=no
fi
fi
##########################################
# xfsctl() probe, used for raw-posix
if test "$xfs" != "no" ; then
@ -2186,7 +2262,7 @@ fi
##########################################
# curl probe
if test "$curl" != "no" ; then
if $pkg_config libcurl --modversion >/dev/null 2>&1; then
if $pkg_config libcurl --exists; then
curlconfig="$pkg_config libcurl"
else
curlconfig=curl-config
@ -2238,10 +2314,9 @@ if test "$mingw32" = yes; then
else
glib_req_ver=2.12
fi
if $pkg_config --atleast-version=$glib_req_ver gthread-2.0 > /dev/null 2>&1
then
glib_cflags=`$pkg_config --cflags gthread-2.0 2>/dev/null`
glib_libs=`$pkg_config --libs gthread-2.0 2>/dev/null`
if $pkg_config --atleast-version=$glib_req_ver gthread-2.0; then
glib_cflags=`$pkg_config --cflags gthread-2.0`
glib_libs=`$pkg_config --libs gthread-2.0`
LIBS="$glib_libs $LIBS"
libs_qga="$glib_libs $libs_qga"
else
@ -2270,8 +2345,8 @@ if test "$pixman" = "none"; then
pixman_cflags=
pixman_libs=
elif test "$pixman" = "system"; then
pixman_cflags=`$pkg_config --cflags pixman-1 2>/dev/null`
pixman_libs=`$pkg_config --libs pixman-1 2>/dev/null`
pixman_cflags=`$pkg_config --cflags pixman-1`
pixman_libs=`$pkg_config --libs pixman-1`
else
if test ! -d ${source_path}/pixman/pixman; then
error_exit "pixman not present. Your options:" \
@ -2370,8 +2445,7 @@ fi
# libssh2 probe
min_libssh2_version=1.2.8
if test "$libssh2" != "no" ; then
if $pkg_config --atleast-version=$min_libssh2_version libssh2 >/dev/null 2>&1
then
if $pkg_config --atleast-version=$min_libssh2_version libssh2; then
libssh2_cflags=`$pkg_config libssh2 --cflags`
libssh2_libs=`$pkg_config libssh2 --libs`
libssh2=yes
@ -2514,7 +2588,7 @@ fi
fdt_required=no
for target in $target_list; do
case $target in
arm*-softmmu|ppc*-softmmu|microblaze*-softmmu)
aarch64*-softmmu|arm*-softmmu|ppc*-softmmu|microblaze*-softmmu)
fdt_required=yes
;;
esac
@ -2613,14 +2687,14 @@ fi
##########################################
# glusterfs probe
if test "$glusterfs" != "no" ; then
if $pkg_config --atleast-version=3 glusterfs-api >/dev/null 2>&1; then
if $pkg_config --atleast-version=3 glusterfs-api; then
glusterfs="yes"
glusterfs_cflags=`$pkg_config --cflags glusterfs-api 2>/dev/null`
glusterfs_libs=`$pkg_config --libs glusterfs-api 2>/dev/null`
glusterfs_cflags=`$pkg_config --cflags glusterfs-api`
glusterfs_libs=`$pkg_config --libs glusterfs-api`
CFLAGS="$CFLAGS $glusterfs_cflags"
libs_tools="$glusterfs_libs $libs_tools"
libs_softmmu="$glusterfs_libs $libs_softmmu"
if $pkg_config --atleast-version=5 glusterfs-api >/dev/null 2>&1; then
if $pkg_config --atleast-version=5 glusterfs-api; then
glusterfs_discard="yes"
fi
else
@ -2842,6 +2916,37 @@ if compile_prog "" "" ; then
dup3=yes
fi
# check for ppoll support
ppoll=no
cat > $TMPC << EOF
#include <poll.h>
int main(void)
{
struct pollfd pfd = { .fd = 0, .events = 0, .revents = 0 };
ppoll(&pfd, 1, 0, 0);
return 0;
}
EOF
if compile_prog "" "" ; then
ppoll=yes
fi
# check for prctl(PR_SET_TIMERSLACK , ... ) support
prctl_pr_set_timerslack=no
cat > $TMPC << EOF
#include <sys/prctl.h>
int main(void)
{
prctl(PR_SET_TIMERSLACK, 1, 0, 0, 0);
return 0;
}
EOF
if compile_prog "" "" ; then
prctl_pr_set_timerslack=yes
fi
# check for epoll support
epoll=no
cat > $TMPC << EOF
@ -2952,10 +3057,10 @@ if test "$libiscsi" != "no" ; then
#include <iscsi/iscsi.h>
int main(void) { iscsi_unmap_sync(NULL,0,0,0,NULL,0); return 0; }
EOF
if $pkg_config --atleast-version=1.7.0 libiscsi --modversion >/dev/null 2>&1; then
if $pkg_config --atleast-version=1.7.0 libiscsi; then
libiscsi="yes"
libiscsi_cflags=$($pkg_config --cflags libiscsi 2>/dev/null)
libiscsi_libs=$($pkg_config --libs libiscsi 2>/dev/null)
libiscsi_cflags=$($pkg_config --cflags libiscsi)
libiscsi_libs=$($pkg_config --libs libiscsi)
CFLAGS="$CFLAGS $libiscsi_cflags"
LIBS="$LIBS $libiscsi_libs"
elif compile_prog "" "-liscsi" ; then
@ -3022,8 +3127,8 @@ int main(void) { spice_server_new(); return 0; }
EOF
spice_cflags=$($pkg_config --cflags spice-protocol spice-server 2>/dev/null)
spice_libs=$($pkg_config --libs spice-protocol spice-server 2>/dev/null)
if $pkg_config --atleast-version=0.12.0 spice-server >/dev/null 2>&1 && \
$pkg_config --atleast-version=0.12.3 spice-protocol > /dev/null 2>&1 && \
if $pkg_config --atleast-version=0.12.0 spice-server && \
$pkg_config --atleast-version=0.12.3 spice-protocol && \
compile_prog "$spice_cflags" "$spice_libs" ; then
spice="yes"
libs_softmmu="$libs_softmmu $spice_libs"
@ -3058,7 +3163,7 @@ EOF
test_cflags="-Werror $test_cflags"
fi
if test -n "$libtool" &&
$pkg_config --atleast-version=3.12.8 nss >/dev/null 2>&1 && \
$pkg_config --atleast-version=3.12.8 nss && \
compile_prog "$test_cflags" "$libcacard_libs"; then
smartcard_nss="yes"
QEMU_CFLAGS="$QEMU_CFLAGS $libcacard_cflags"
@ -3074,11 +3179,10 @@ fi
# check for libusb
if test "$libusb" != "no" ; then
if $pkg_config --atleast-version=1.0.13 libusb-1.0 >/dev/null 2>&1 ; then
if $pkg_config --atleast-version=1.0.13 libusb-1.0; then
libusb="yes"
usb="libusb"
libusb_cflags=$($pkg_config --cflags libusb-1.0 2>/dev/null)
libusb_libs=$($pkg_config --libs libusb-1.0 2>/dev/null)
libusb_cflags=$($pkg_config --cflags libusb-1.0)
libusb_libs=$($pkg_config --libs libusb-1.0)
QEMU_CFLAGS="$QEMU_CFLAGS $libusb_cflags"
libs_softmmu="$libs_softmmu $libusb_libs"
else
@ -3091,10 +3195,10 @@ fi
# check for usbredirparser for usb network redirection support
if test "$usb_redir" != "no" ; then
if $pkg_config --atleast-version=0.6 libusbredirparser-0.5 >/dev/null 2>&1 ; then
if $pkg_config --atleast-version=0.6 libusbredirparser-0.5; then
usb_redir="yes"
usb_redir_cflags=$($pkg_config --cflags libusbredirparser-0.5 2>/dev/null)
usb_redir_libs=$($pkg_config --libs libusbredirparser-0.5 2>/dev/null)
usb_redir_cflags=$($pkg_config --cflags libusbredirparser-0.5)
usb_redir_libs=$($pkg_config --libs libusbredirparser-0.5)
QEMU_CFLAGS="$QEMU_CFLAGS $usb_redir_cflags"
libs_softmmu="$libs_softmmu $usb_redir_libs"
else
@ -3105,6 +3209,61 @@ if test "$usb_redir" != "no" ; then
fi
fi
##########################################
# check if we have VSS SDK headers for win
if test "$mingw32" = "yes" -a "$guest_agent" != "no" -a "$vss_win32_sdk" != "no" ; then
case "$vss_win32_sdk" in
"") vss_win32_include="-I$source_path" ;;
*\ *) # The SDK is installed in "Program Files" by default, but we cannot
# handle path with spaces. So we symlink the headers into ".sdk/vss".
vss_win32_include="-I$source_path/.sdk/vss"
symlink "$vss_win32_sdk/inc" "$source_path/.sdk/vss/inc"
;;
*) vss_win32_include="-I$vss_win32_sdk"
esac
cat > $TMPC << EOF
#define __MIDL_user_allocate_free_DEFINED__
#include <inc/win2003/vss.h>
int main(void) { return VSS_CTX_BACKUP; }
EOF
if compile_prog "$vss_win32_include" "" ; then
guest_agent_with_vss="yes"
QEMU_CFLAGS="$QEMU_CFLAGS $vss_win32_include"
libs_qga="-lole32 -loleaut32 -lshlwapi -luuid -lstdc++ -Wl,--enable-stdcall-fixup $libs_qga"
else
if test "$vss_win32_sdk" != "" ; then
echo "ERROR: Please download and install Microsoft VSS SDK:"
echo "ERROR: http://www.microsoft.com/en-us/download/details.aspx?id=23490"
echo "ERROR: On POSIX-systems, you can extract the SDK headers by:"
echo "ERROR: scripts/extract-vsssdk-headers setup.exe"
echo "ERROR: The headers are extracted in the directory \`inc'."
feature_not_found "VSS support"
fi
guest_agent_with_vss="no"
fi
fi
##########################################
# lookup Windows platform SDK (if not specified)
# The SDK is needed only to build .tlb (type library) file of guest agent
# VSS provider from the source. It is usually unnecessary because the
# pre-compiled .tlb file is included.
if test "$mingw32" = "yes" -a "$guest_agent" != "no" -a "$guest_agent_with_vss" = "yes" ; then
if test -z "$win_sdk"; then
programfiles="$PROGRAMFILES"
test -n "$PROGRAMW6432" && programfiles="$PROGRAMW6432"
if test -n "$programfiles"; then
win_sdk=$(ls -d "$programfiles/Microsoft SDKs/Windows/v"* | tail -1) 2>/dev/null
else
feature_not_found "Windows SDK"
fi
elif test "$win_sdk" = "no"; then
win_sdk=""
fi
fi
##########################################
##########################################
@ -3264,6 +3423,17 @@ else
esac
fi
if test "$coroutine_pool" = ""; then
if test "$coroutine" = "gthread"; then
coroutine_pool=no
else
coroutine_pool=yes
fi
fi
if test "$coroutine" = "gthread" -a "$coroutine_pool" = "yes"; then
error_exit "'gthread' coroutine backend does not support pool (use --disable-coroutine-pool)"
fi
##########################################
# check if we have open_by_handle_at
@ -3404,7 +3574,7 @@ if test "$gcov" = "yes" ; then
CFLAGS="-fprofile-arcs -ftest-coverage -g $CFLAGS"
LDFLAGS="-fprofile-arcs -ftest-coverage $LDFLAGS"
elif test "$debug" = "no" ; then
CFLAGS="-O2 -D_FORTIFY_SOURCE=2 $CFLAGS"
CFLAGS="-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 $CFLAGS"
fi
@ -3470,8 +3640,11 @@ if test "$softmmu" = yes ; then
fi
fi
if [ "$guest_agent" != "no" ]; then
if [ "$linux" = "yes" -o "$bsd" = "yes" -o "$solaris" = "yes" ] ; then
if [ "$linux" = "yes" -o "$bsd" = "yes" -o "$solaris" = "yes" -o "$mingw32" = "yes" ] ; then
tools="qemu-ga\$(EXESUF) $tools"
if [ "$mingw32" = "yes" -a "$guest_agent_with_vss" = "yes" ]; then
tools="qga/vss-win32/qga-vss.dll qga/vss-win32/qga-vss.tlb $tools"
fi
guest_agent=yes
elif [ "$guest_agent" != yes ]; then
guest_agent=no
@ -3499,7 +3672,7 @@ fi
if test "$pie" = "no" ; then
textseg_addr=
case "$cpu" in
arm | hppa | i386 | m68k | ppc | ppc64 | s390* | sparc | sparc64 | x86_64)
arm | hppa | i386 | m68k | ppc | ppc64 | s390* | sparc | sparc64 | x86_64 | x32)
textseg_addr=0x60000000
;;
mips)
@ -3542,11 +3715,14 @@ echo "Manual directory `eval echo $mandir`"
echo "ELF interp prefix $interp_prefix"
else
echo "local state directory queried at runtime"
echo "Windows SDK $win_sdk"
fi
echo "Source path $source_path"
echo "C compiler $cc"
echo "Host C compiler $host_cc"
echo "C++ compiler $cxx"
echo "Objective-C compiler $objcc"
echo "ARFLAGS $ARFLAGS"
echo "CFLAGS $CFLAGS"
echo "QEMU_CFLAGS $QEMU_CFLAGS"
echo "LDFLAGS $LDFLAGS"
@ -3578,7 +3754,6 @@ echo "mingw32 support $mingw32"
echo "Audio drivers $audio_drv_list"
echo "Block whitelist (rw) $block_drv_rw_whitelist"
echo "Block whitelist (ro) $block_drv_ro_whitelist"
echo "Mixer emulation $mixemu"
echo "VirtFS support $virtfs"
echo "VNC support $vnc"
if test "$vnc" = "yes" ; then
@ -3627,8 +3802,10 @@ echo "usb net redir $usb_redir"
echo "OpenGL support $opengl"
echo "libiscsi support $libiscsi"
echo "build guest agent $guest_agent"
echo "QGA VSS support $guest_agent_with_vss"
echo "seccomp support $seccomp"
echo "coroutine backend $coroutine"
echo "coroutine pool $coroutine_pool"
echo "GlusterFS support $glusterfs"
echo "virtio-blk-data-plane $virtio_blk_data_plane"
echo "gcov $gcov_tool"
@ -3637,6 +3814,7 @@ echo "TPM support $tpm"
echo "libssh2 support $libssh2"
echo "TPM passthrough $tpm_passthrough"
echo "QOM debugging $qom_cast_debug"
echo "vhdx $vhdx"
if test "$sdl_too_old" = "yes"; then
echo "-> Your SDL version is too old - please upgrade to have SDL support"
@ -3647,8 +3825,6 @@ config_host_mak="config-host.mak"
echo "# Automatically generated by configure - do not modify" >config-all-disas.mak
echo "# Automatically generated by configure - do not modify" > $config_host_mak
printf "# Configured with:" >> $config_host_mak
printf " '%s'" "$0" "$@" >> $config_host_mak
echo >> $config_host_mak
echo all: >> $config_host_mak
@ -3673,14 +3849,6 @@ echo "libs_softmmu=$libs_softmmu" >> $config_host_mak
echo "ARCH=$ARCH" >> $config_host_mak
case "$cpu" in
arm|i386|x86_64|ppc|aarch64)
# The TCG interpreter currently does not support ld/st optimization.
if test "$tcg_interpreter" = "no" ; then
echo "CONFIG_QEMU_LDST_OPTIMIZATION=y" >> $config_host_mak
fi
;;
esac
if test "$debug_tcg" = "yes" ; then
echo "CONFIG_DEBUG_TCG=y" >> $config_host_mak
fi
@ -3701,6 +3869,10 @@ if test "$mingw32" = "yes" ; then
version_micro=0
echo "CONFIG_FILEVERSION=$version_major,$version_minor,$version_subminor,$version_micro" >> $config_host_mak
echo "CONFIG_PRODUCTVERSION=$version_major,$version_minor,$version_subminor,$version_micro" >> $config_host_mak
if test "$guest_agent_with_vss" = "yes" ; then
echo "CONFIG_QGA_VSS=y" >> $config_host_mak
echo "WIN_SDK=\"$win_sdk\"" >> $config_host_mak
fi
else
echo "CONFIG_POSIX=y" >> $config_host_mak
fi
@ -3759,9 +3931,6 @@ if test "$audio_win_int" = "yes" ; then
fi
echo "CONFIG_BDRV_RW_WHITELIST=$block_drv_rw_whitelist" >> $config_host_mak
echo "CONFIG_BDRV_RO_WHITELIST=$block_drv_ro_whitelist" >> $config_host_mak
if test "$mixemu" = "yes" ; then
echo "CONFIG_MIXEMU=y" >> $config_host_mak
fi
if test "$vnc" = "yes" ; then
echo "CONFIG_VNC=y" >> $config_host_mak
fi
@ -3838,6 +4007,12 @@ fi
if test "$dup3" = "yes" ; then
echo "CONFIG_DUP3=y" >> $config_host_mak
fi
if test "$ppoll" = "yes" ; then
echo "CONFIG_PPOLL=y" >> $config_host_mak
fi
if test "$prctl_pr_set_timerslack" = "yes" ; then
echo "CONFIG_PRCTL_PR_SET_TIMERSLACK=y" >> $config_host_mak
fi
if test "$epoll" = "yes" ; then
echo "CONFIG_EPOLL=y" >> $config_host_mak
fi
@ -3978,6 +4153,11 @@ if test "$rbd" = "yes" ; then
fi
echo "CONFIG_COROUTINE_BACKEND=$coroutine" >> $config_host_mak
if test "$coroutine_pool" = "yes" ; then
echo "CONFIG_COROUTINE_POOL=1" >> $config_host_mak
else
echo "CONFIG_COROUTINE_POOL=0" >> $config_host_mak
fi
if test "$open_by_handle_at" = "yes" ; then
echo "CONFIG_OPEN_BY_HANDLE=y" >> $config_host_mak
@ -4027,25 +4207,16 @@ if test "$virtio_blk_data_plane" = "yes" ; then
echo 'CONFIG_VIRTIO_BLK_DATA_PLANE=$(CONFIG_VIRTIO)' >> $config_host_mak
fi
if test "$vhdx" = "yes" ; then
echo "CONFIG_VHDX=y" >> $config_host_mak
fi
# USB host support
case "$usb" in
linux)
echo "HOST_USB=linux legacy" >> $config_host_mak
;;
bsd)
echo "HOST_USB=bsd" >> $config_host_mak
;;
libusb)
if test "$linux" = "yes"; then
echo "HOST_USB=libusb linux legacy" >> $config_host_mak
else
echo "HOST_USB=libusb legacy" >> $config_host_mak
fi
;;
*)
if test "$libusb" = "yes"; then
echo "HOST_USB=libusb legacy" >> $config_host_mak
else
echo "HOST_USB=stub" >> $config_host_mak
;;
esac
fi
# TPM passthrough support?
if test "$tpm" = "yes"; then
@ -4103,7 +4274,7 @@ elif test "$ARCH" = "sparc64" ; then
QEMU_INCLUDES="-I\$(SRC_PATH)/tcg/sparc $QEMU_INCLUDES"
elif test "$ARCH" = "s390x" ; then
QEMU_INCLUDES="-I\$(SRC_PATH)/tcg/s390 $QEMU_INCLUDES"
elif test "$ARCH" = "x86_64" ; then
elif test "$ARCH" = "x86_64" -o "$ARCH" = "x32" ; then
QEMU_INCLUDES="-I\$(SRC_PATH)/tcg/i386 $QEMU_INCLUDES"
else
QEMU_INCLUDES="-I\$(SRC_PATH)/tcg/\$(ARCH) $QEMU_INCLUDES"
@ -4125,10 +4296,15 @@ else
fi
echo "PYTHON=$python" >> $config_host_mak
echo "CC=$cc" >> $config_host_mak
if $iasl -h > /dev/null 2>&1; then
echo "IASL=$iasl" >> $config_host_mak
fi
echo "CC_I386=$cc_i386" >> $config_host_mak
echo "HOST_CC=$host_cc" >> $config_host_mak
echo "CXX=$cxx" >> $config_host_mak
echo "OBJCC=$objcc" >> $config_host_mak
echo "AR=$ar" >> $config_host_mak
echo "ARFLAGS=$ARFLAGS" >> $config_host_mak
echo "AS=$as" >> $config_host_mak
echo "CPP=$cpp" >> $config_host_mak
echo "OBJCOPY=$objcopy" >> $config_host_mak
@ -4165,7 +4341,7 @@ fi
if test "$linux" = "yes" ; then
mkdir -p linux-headers
case "$cpu" in
i386|x86_64)
i386|x86_64|x32)
linux_arch=x86
;;
ppcemb|ppc|ppc64)
@ -4251,6 +4427,11 @@ case "$target_name" in
bflt="yes"
gdb_xml_files="arm-core.xml arm-vfp.xml arm-vfp3.xml arm-neon.xml"
;;
aarch64)
TARGET_BASE_ARCH=arm
bflt="yes"
gdb_xml_files="aarch64-core.xml"
;;
cris)
;;
lm32)
@ -4442,7 +4623,7 @@ for i in $ARCH $TARGET_BASE_ARCH ; do
echo "CONFIG_HPPA_DIS=y" >> $config_target_mak
echo "CONFIG_HPPA_DIS=y" >> config-all-disas.mak
;;
i386|x86_64)
i386|x86_64|x32)
echo "CONFIG_I386_DIS=y" >> $config_target_mak
echo "CONFIG_I386_DIS=y" >> config-all-disas.mak
;;
@ -4542,7 +4723,8 @@ if [ "$dtc_internal" = "yes" ]; then
fi
# build tree in object directory in case the source is not in the current directory
DIRS="tests tests/tcg tests/tcg/cris tests/tcg/lm32 tests/libqos tests/qapi-schema tests/tcg/xtensa"
DIRS="tests tests/tcg tests/tcg/cris tests/tcg/lm32 tests/libqos tests/qapi-schema tests/tcg/xtensa tests/qemu-iotests"
DIRS="$DIRS fsdev"
DIRS="$DIRS pc-bios/optionrom pc-bios/spapr-rtas pc-bios/s390-ccw"
DIRS="$DIRS roms/seabios roms/vgabios"
DIRS="$DIRS qapi-generated"
@ -4582,7 +4764,7 @@ for rom in seabios vgabios ; do
echo "BCC=bcc" >> $config_mak
echo "CPP=$cpp" >> $config_mak
echo "OBJCOPY=objcopy" >> $config_mak
echo "IASL=iasl" >> $config_mak
echo "IASL=$iasl" >> $config_mak
echo "LD=$ld" >> $config_mak
done

View File

@ -53,7 +53,7 @@ void cpu_resume_from_signal(CPUArchState *env, void *puc)
static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, uint8_t *tb_ptr)
{
CPUArchState *env = cpu->env_ptr;
tcg_target_ulong next_tb = tcg_qemu_tb_exec(env, tb_ptr);
uintptr_t next_tb = tcg_qemu_tb_exec(env, tb_ptr);
if ((next_tb & TB_EXIT_MASK) > TB_EXIT_IDX1) {
/* We didn't start executing this TB (eg because the instruction
* counter hit zero); we must restore the guest PC to the address
@ -209,7 +209,7 @@ int cpu_exec(CPUArchState *env)
int ret, interrupt_request;
TranslationBlock *tb;
uint8_t *tc_ptr;
tcg_target_ulong next_tb;
uintptr_t next_tb;
if (cpu->halted) {
if (!cpu_has_work(cpu)) {
@ -681,6 +681,10 @@ int cpu_exec(CPUArchState *env)
* local variables as longjmp is marked 'noreturn'. */
cpu = current_cpu;
env = cpu->env_ptr;
#if !(defined(CONFIG_USER_ONLY) && \
(defined(TARGET_M68K) || defined(TARGET_PPC) || defined(TARGET_S390X)))
cc = CPU_GET_CLASS(cpu);
#endif
}
} /* for(;;) */

373
cpus.c
View File

@ -37,6 +37,7 @@
#include "sysemu/qtest.h"
#include "qemu/main-loop.h"
#include "qemu/bitmap.h"
#include "qemu/seqlock.h"
#ifndef _WIN32
#include "qemu/compatfd.h"
@ -62,12 +63,17 @@
static CPUState *next_cpu;
bool cpu_is_stopped(CPUState *cpu)
{
return cpu->stopped || !runstate_is_running();
}
static bool cpu_thread_is_idle(CPUState *cpu)
{
if (cpu->stop || cpu->queued_work_first) {
return false;
}
if (cpu->stopped || !runstate_is_running()) {
if (cpu_is_stopped(cpu)) {
return true;
}
if (!cpu->halted || qemu_cpu_has_work(cpu) ||
@ -81,7 +87,7 @@ static bool all_cpu_threads_idle(void)
{
CPUState *cpu;
for (cpu = first_cpu; cpu != NULL; cpu = cpu->next_cpu) {
CPU_FOREACH(cpu) {
if (!cpu_thread_is_idle(cpu)) {
return false;
}
@ -92,21 +98,32 @@ static bool all_cpu_threads_idle(void)
/***********************************************************/
/* guest cycle counter */
/* Protected by TimersState seqlock */
/* Compensate for varying guest execution speed. */
static int64_t qemu_icount_bias;
static int64_t vm_clock_warp_start;
/* Conversion factor from emulated instructions to virtual clock ticks. */
static int icount_time_shift;
/* Arbitrarily pick 1MIPS as the minimum allowable speed. */
#define MAX_ICOUNT_SHIFT 10
/* Compensate for varying guest execution speed. */
static int64_t qemu_icount_bias;
/* Only written by TCG thread */
static int64_t qemu_icount;
static QEMUTimer *icount_rt_timer;
static QEMUTimer *icount_vm_timer;
static QEMUTimer *icount_warp_timer;
static int64_t vm_clock_warp_start;
static int64_t qemu_icount;
typedef struct TimersState {
/* Protected by BQL. */
int64_t cpu_ticks_prev;
int64_t cpu_ticks_offset;
/* cpu_clock_offset can be read out of BQL, so protect it with
* this lock.
*/
QemuSeqLock vm_clock_seqlock;
int64_t cpu_clock_offset;
int32_t cpu_ticks_enabled;
int64_t dummy;
@ -115,7 +132,7 @@ typedef struct TimersState {
static TimersState timers_state;
/* Return the virtual CPU time, based on the instruction counter. */
int64_t cpu_get_icount(void)
static int64_t cpu_get_icount_locked(void)
{
int64_t icount;
CPUState *cpu = current_cpu;
@ -131,58 +148,100 @@ int64_t cpu_get_icount(void)
return qemu_icount_bias + (icount << icount_time_shift);
}
int64_t cpu_get_icount(void)
{
int64_t icount;
unsigned start;
do {
start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
icount = cpu_get_icount_locked();
} while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
return icount;
}
/* return the host CPU cycle counter and handle stop/restart */
/* Caller must hold the BQL */
int64_t cpu_get_ticks(void)
{
int64_t ticks;
if (use_icount) {
return cpu_get_icount();
}
if (!timers_state.cpu_ticks_enabled) {
return timers_state.cpu_ticks_offset;
} else {
int64_t ticks;
ticks = cpu_get_real_ticks();
if (timers_state.cpu_ticks_prev > ticks) {
/* Note: non increasing ticks may happen if the host uses
software suspend */
timers_state.cpu_ticks_offset += timers_state.cpu_ticks_prev - ticks;
}
timers_state.cpu_ticks_prev = ticks;
return ticks + timers_state.cpu_ticks_offset;
ticks = timers_state.cpu_ticks_offset;
if (timers_state.cpu_ticks_enabled) {
ticks += cpu_get_real_ticks();
}
if (timers_state.cpu_ticks_prev > ticks) {
/* Note: non increasing ticks may happen if the host uses
software suspend */
timers_state.cpu_ticks_offset += timers_state.cpu_ticks_prev - ticks;
ticks = timers_state.cpu_ticks_prev;
}
timers_state.cpu_ticks_prev = ticks;
return ticks;
}
static int64_t cpu_get_clock_locked(void)
{
int64_t ticks;
ticks = timers_state.cpu_clock_offset;
if (timers_state.cpu_ticks_enabled) {
ticks += get_clock();
}
return ticks;
}
/* return the host CPU monotonic timer and handle stop/restart */
int64_t cpu_get_clock(void)
{
int64_t ti;
if (!timers_state.cpu_ticks_enabled) {
return timers_state.cpu_clock_offset;
} else {
ti = get_clock();
return ti + timers_state.cpu_clock_offset;
}
unsigned start;
do {
start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
ti = cpu_get_clock_locked();
} while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
return ti;
}
/* enable cpu_get_ticks() */
/* enable cpu_get_ticks()
* Caller must hold BQL which server as mutex for vm_clock_seqlock.
*/
void cpu_enable_ticks(void)
{
/* Here, the really thing protected by seqlock is cpu_clock_offset. */
seqlock_write_lock(&timers_state.vm_clock_seqlock);
if (!timers_state.cpu_ticks_enabled) {
timers_state.cpu_ticks_offset -= cpu_get_real_ticks();
timers_state.cpu_clock_offset -= get_clock();
timers_state.cpu_ticks_enabled = 1;
}
seqlock_write_unlock(&timers_state.vm_clock_seqlock);
}
/* disable cpu_get_ticks() : the clock is stopped. You must not call
cpu_get_ticks() after that. */
* cpu_get_ticks() after that.
* Caller must hold BQL which server as mutex for vm_clock_seqlock.
*/
void cpu_disable_ticks(void)
{
/* Here, the really thing protected by seqlock is cpu_clock_offset. */
seqlock_write_lock(&timers_state.vm_clock_seqlock);
if (timers_state.cpu_ticks_enabled) {
timers_state.cpu_ticks_offset = cpu_get_ticks();
timers_state.cpu_clock_offset = cpu_get_clock();
timers_state.cpu_ticks_offset += cpu_get_real_ticks();
timers_state.cpu_clock_offset = cpu_get_clock_locked();
timers_state.cpu_ticks_enabled = 0;
}
seqlock_write_unlock(&timers_state.vm_clock_seqlock);
}
/* Correlation between real and virtual time is always going to be
@ -196,13 +255,19 @@ static void icount_adjust(void)
int64_t cur_time;
int64_t cur_icount;
int64_t delta;
/* Protected by TimersState mutex. */
static int64_t last_delta;
/* If the VM is not running, then do nothing. */
if (!runstate_is_running()) {
return;
}
cur_time = cpu_get_clock();
cur_icount = qemu_get_clock_ns(vm_clock);
seqlock_write_lock(&timers_state.vm_clock_seqlock);
cur_time = cpu_get_clock_locked();
cur_icount = cpu_get_icount_locked();
delta = cur_icount - cur_time;
/* FIXME: This is a very crude algorithm, somewhat prone to oscillation. */
if (delta > 0
@ -219,19 +284,21 @@ static void icount_adjust(void)
}
last_delta = delta;
qemu_icount_bias = cur_icount - (qemu_icount << icount_time_shift);
seqlock_write_unlock(&timers_state.vm_clock_seqlock);
}
static void icount_adjust_rt(void *opaque)
{
qemu_mod_timer(icount_rt_timer,
qemu_get_clock_ms(rt_clock) + 1000);
timer_mod(icount_rt_timer,
qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + 1000);
icount_adjust();
}
static void icount_adjust_vm(void *opaque)
{
qemu_mod_timer(icount_vm_timer,
qemu_get_clock_ns(vm_clock) + get_ticks_per_sec() / 10);
timer_mod(icount_vm_timer,
qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
get_ticks_per_sec() / 10);
icount_adjust();
}
@ -242,48 +309,59 @@ static int64_t qemu_icount_round(int64_t count)
static void icount_warp_rt(void *opaque)
{
if (vm_clock_warp_start == -1) {
/* The icount_warp_timer is rescheduled soon after vm_clock_warp_start
* changes from -1 to another value, so the race here is okay.
*/
if (atomic_read(&vm_clock_warp_start) == -1) {
return;
}
seqlock_write_lock(&timers_state.vm_clock_seqlock);
if (runstate_is_running()) {
int64_t clock = qemu_get_clock_ns(rt_clock);
int64_t warp_delta = clock - vm_clock_warp_start;
if (use_icount == 1) {
qemu_icount_bias += warp_delta;
} else {
int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
int64_t warp_delta;
warp_delta = clock - vm_clock_warp_start;
if (use_icount == 2) {
/*
* In adaptive mode, do not let the vm_clock run too
* In adaptive mode, do not let QEMU_CLOCK_VIRTUAL run too
* far ahead of real time.
*/
int64_t cur_time = cpu_get_clock();
int64_t cur_icount = qemu_get_clock_ns(vm_clock);
int64_t cur_time = cpu_get_clock_locked();
int64_t cur_icount = cpu_get_icount_locked();
int64_t delta = cur_time - cur_icount;
qemu_icount_bias += MIN(warp_delta, delta);
}
if (qemu_clock_expired(vm_clock)) {
qemu_notify_event();
warp_delta = MIN(warp_delta, delta);
}
qemu_icount_bias += warp_delta;
}
vm_clock_warp_start = -1;
seqlock_write_unlock(&timers_state.vm_clock_seqlock);
if (qemu_clock_expired(QEMU_CLOCK_VIRTUAL)) {
qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
}
}
void qtest_clock_warp(int64_t dest)
{
int64_t clock = qemu_get_clock_ns(vm_clock);
int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
assert(qtest_enabled());
while (clock < dest) {
int64_t deadline = qemu_clock_deadline(vm_clock);
int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
int64_t warp = MIN(dest - clock, deadline);
seqlock_write_lock(&timers_state.vm_clock_seqlock);
qemu_icount_bias += warp;
qemu_run_timers(vm_clock);
clock = qemu_get_clock_ns(vm_clock);
seqlock_write_unlock(&timers_state.vm_clock_seqlock);
qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
}
qemu_notify_event();
qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
}
void qemu_clock_warp(QEMUClock *clock)
void qemu_clock_warp(QEMUClockType type)
{
int64_t clock;
int64_t deadline;
/*
@ -291,20 +369,20 @@ void qemu_clock_warp(QEMUClock *clock)
* applicable to other clocks. But a clock argument removes the
* need for if statements all over the place.
*/
if (clock != vm_clock || !use_icount) {
if (type != QEMU_CLOCK_VIRTUAL || !use_icount) {
return;
}
/*
* If the CPUs have been sleeping, advance the vm_clock timer now. This
* ensures that the deadline for the timer is computed correctly below.
* If the CPUs have been sleeping, advance QEMU_CLOCK_VIRTUAL timer now.
* This ensures that the deadline for the timer is computed correctly below.
* This also makes sure that the insn counter is synchronized before the
* CPU starts running, in case the CPU is woken by an event other than
* the earliest vm_clock timer.
* the earliest QEMU_CLOCK_VIRTUAL timer.
*/
icount_warp_rt(NULL);
if (!all_cpu_threads_idle() || !qemu_clock_has_timers(vm_clock)) {
qemu_del_timer(icount_warp_timer);
timer_del(icount_warp_timer);
if (!all_cpu_threads_idle()) {
return;
}
@ -313,28 +391,39 @@ void qemu_clock_warp(QEMUClock *clock)
return;
}
vm_clock_warp_start = qemu_get_clock_ns(rt_clock);
deadline = qemu_clock_deadline(vm_clock);
/* We want to use the earliest deadline from ALL vm_clocks */
clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
if (deadline < 0) {
return;
}
if (deadline > 0) {
/*
* Ensure the vm_clock proceeds even when the virtual CPU goes to
* Ensure QEMU_CLOCK_VIRTUAL proceeds even when the virtual CPU goes to
* sleep. Otherwise, the CPU might be waiting for a future timer
* interrupt to wake it up, but the interrupt never comes because
* the vCPU isn't running any insns and thus doesn't advance the
* vm_clock.
* QEMU_CLOCK_VIRTUAL.
*
* An extreme solution for this problem would be to never let VCPUs
* sleep in icount mode if there is a pending vm_clock timer; rather
* time could just advance to the next vm_clock event. Instead, we
* do stop VCPUs and only advance vm_clock after some "real" time,
* (related to the time left until the next event) has passed. This
* rt_clock timer will do this. This avoids that the warps are too
* visible externally---for example, you will not be sending network
* packets continuously instead of every 100ms.
* sleep in icount mode if there is a pending QEMU_CLOCK_VIRTUAL
* timer; rather time could just advance to the next QEMU_CLOCK_VIRTUAL
* event. Instead, we do stop VCPUs and only advance QEMU_CLOCK_VIRTUAL
* after some e"real" time, (related to the time left until the next
* event) has passed. The QEMU_CLOCK_REALTIME timer will do this.
* This avoids that the warps are visible externally; for example,
* you will not be sending network packets continuously instead of
* every 100ms.
*/
qemu_mod_timer(icount_warp_timer, vm_clock_warp_start + deadline);
} else {
qemu_notify_event();
seqlock_write_lock(&timers_state.vm_clock_seqlock);
if (vm_clock_warp_start == -1 || vm_clock_warp_start > clock) {
vm_clock_warp_start = clock;
}
seqlock_write_unlock(&timers_state.vm_clock_seqlock);
timer_mod_anticipate(icount_warp_timer, clock + deadline);
} else if (deadline == 0) {
qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
}
}
@ -353,12 +442,14 @@ static const VMStateDescription vmstate_timers = {
void configure_icount(const char *option)
{
seqlock_init(&timers_state.vm_clock_seqlock, NULL);
vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
if (!option) {
return;
}
icount_warp_timer = qemu_new_timer_ns(rt_clock, icount_warp_rt, NULL);
icount_warp_timer = timer_new_ns(QEMU_CLOCK_REALTIME,
icount_warp_rt, NULL);
if (strcmp(option, "auto") != 0) {
icount_time_shift = strtol(option, NULL, 0);
use_icount = 1;
@ -376,12 +467,15 @@ void configure_icount(const char *option)
the virtual time trigger catches emulated time passing too fast.
Realtime triggers occur even when idle, so use them less frequently
than VM triggers. */
icount_rt_timer = qemu_new_timer_ms(rt_clock, icount_adjust_rt, NULL);
qemu_mod_timer(icount_rt_timer,
qemu_get_clock_ms(rt_clock) + 1000);
icount_vm_timer = qemu_new_timer_ns(vm_clock, icount_adjust_vm, NULL);
qemu_mod_timer(icount_vm_timer,
qemu_get_clock_ns(vm_clock) + get_ticks_per_sec() / 10);
icount_rt_timer = timer_new_ms(QEMU_CLOCK_REALTIME,
icount_adjust_rt, NULL);
timer_mod(icount_rt_timer,
qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + 1000);
icount_vm_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
icount_adjust_vm, NULL);
timer_mod(icount_vm_timer,
qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
get_ticks_per_sec() / 10);
}
/***********************************************************/
@ -394,7 +488,7 @@ void hw_error(const char *fmt, ...)
fprintf(stderr, "qemu: hardware error: ");
vfprintf(stderr, fmt, ap);
fprintf(stderr, "\n");
for (cpu = first_cpu; cpu != NULL; cpu = cpu->next_cpu) {
CPU_FOREACH(cpu) {
fprintf(stderr, "CPU #%d:\n", cpu->cpu_index);
cpu_dump_state(cpu, stderr, fprintf, CPU_DUMP_FPU);
}
@ -406,7 +500,7 @@ void cpu_synchronize_all_states(void)
{
CPUState *cpu;
for (cpu = first_cpu; cpu; cpu = cpu->next_cpu) {
CPU_FOREACH(cpu) {
cpu_synchronize_state(cpu);
}
}
@ -415,7 +509,7 @@ void cpu_synchronize_all_post_reset(void)
{
CPUState *cpu;
for (cpu = first_cpu; cpu; cpu = cpu->next_cpu) {
CPU_FOREACH(cpu) {
cpu_synchronize_post_reset(cpu);
}
}
@ -424,16 +518,11 @@ void cpu_synchronize_all_post_init(void)
{
CPUState *cpu;
for (cpu = first_cpu; cpu; cpu = cpu->next_cpu) {
CPU_FOREACH(cpu) {
cpu_synchronize_post_init(cpu);
}
}
bool cpu_is_stopped(CPUState *cpu)
{
return !runstate_is_running() || cpu->stopped;
}
static int do_vm_stop(RunState state)
{
int ret = 0;
@ -457,7 +546,7 @@ static bool cpu_can_run(CPUState *cpu)
if (cpu->stop) {
return false;
}
if (cpu->stopped || !runstate_is_running()) {
if (cpu_is_stopped(cpu)) {
return false;
}
return true;
@ -735,7 +824,7 @@ static void qemu_tcg_wait_io_event(void)
while (all_cpu_threads_idle()) {
/* Start accounting real time to the virtual clock if the CPUs
are idle. */
qemu_clock_warp(vm_clock);
qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
qemu_cond_wait(tcg_halt_cond, &qemu_global_mutex);
}
@ -743,7 +832,7 @@ static void qemu_tcg_wait_io_event(void)
qemu_cond_wait(&qemu_io_proceeded_cond, &qemu_global_mutex);
}
for (cpu = first_cpu; cpu != NULL; cpu = cpu->next_cpu) {
CPU_FOREACH(cpu) {
qemu_wait_io_event_common(cpu);
}
}
@ -837,12 +926,6 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
static void tcg_exec_all(void);
static void tcg_signal_cpu_creation(CPUState *cpu, void *data)
{
cpu->thread_id = qemu_get_thread_id();
cpu->created = true;
}
static void *qemu_tcg_cpu_thread_fn(void *arg)
{
CPUState *cpu = arg;
@ -851,23 +934,31 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
qemu_thread_get_self(cpu->thread);
qemu_mutex_lock(&qemu_global_mutex);
qemu_for_each_cpu(tcg_signal_cpu_creation, NULL);
CPU_FOREACH(cpu) {
cpu->thread_id = qemu_get_thread_id();
cpu->created = true;
}
qemu_cond_signal(&qemu_cpu_cond);
/* wait for initial kick-off after machine start */
while (first_cpu->stopped) {
while (QTAILQ_FIRST(&cpus)->stopped) {
qemu_cond_wait(tcg_halt_cond, &qemu_global_mutex);
/* process any pending work */
for (cpu = first_cpu; cpu != NULL; cpu = cpu->next_cpu) {
CPU_FOREACH(cpu) {
qemu_wait_io_event_common(cpu);
}
}
while (1) {
tcg_exec_all();
if (use_icount && qemu_clock_deadline(vm_clock) <= 0) {
qemu_notify_event();
if (use_icount) {
int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
if (deadline == 0) {
qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
}
}
qemu_tcg_wait_io_event();
}
@ -969,13 +1060,12 @@ void qemu_mutex_unlock_iothread(void)
static int all_vcpus_paused(void)
{
CPUState *cpu = first_cpu;
CPUState *cpu;
while (cpu) {
CPU_FOREACH(cpu) {
if (!cpu->stopped) {
return 0;
}
cpu = cpu->next_cpu;
}
return 1;
@ -983,23 +1073,20 @@ static int all_vcpus_paused(void)
void pause_all_vcpus(void)
{
CPUState *cpu = first_cpu;
CPUState *cpu;
qemu_clock_enable(vm_clock, false);
while (cpu) {
qemu_clock_enable(QEMU_CLOCK_VIRTUAL, false);
CPU_FOREACH(cpu) {
cpu->stop = true;
qemu_cpu_kick(cpu);
cpu = cpu->next_cpu;
}
if (qemu_in_vcpu_thread()) {
cpu_stop_current();
if (!kvm_enabled()) {
cpu = first_cpu;
while (cpu) {
CPU_FOREACH(cpu) {
cpu->stop = false;
cpu->stopped = true;
cpu = cpu->next_cpu;
}
return;
}
@ -1007,10 +1094,8 @@ void pause_all_vcpus(void)
while (!all_vcpus_paused()) {
qemu_cond_wait(&qemu_pause_cond, &qemu_global_mutex);
cpu = first_cpu;
while (cpu) {
CPU_FOREACH(cpu) {
qemu_cpu_kick(cpu);
cpu = cpu->next_cpu;
}
}
}
@ -1024,12 +1109,11 @@ void cpu_resume(CPUState *cpu)
void resume_all_vcpus(void)
{
CPUState *cpu = first_cpu;
CPUState *cpu;
qemu_clock_enable(vm_clock, true);
while (cpu) {
qemu_clock_enable(QEMU_CLOCK_VIRTUAL, true);
CPU_FOREACH(cpu) {
cpu_resume(cpu);
cpu = cpu->next_cpu;
}
}
@ -1145,11 +1229,23 @@ static int tcg_cpu_exec(CPUArchState *env)
#endif
if (use_icount) {
int64_t count;
int64_t deadline;
int decr;
qemu_icount -= (env->icount_decr.u16.low + env->icount_extra);
env->icount_decr.u16.low = 0;
env->icount_extra = 0;
count = qemu_icount_round(qemu_clock_deadline(vm_clock));
deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
/* Maintain prior (possibly buggy) behaviour where if no deadline
* was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than
* INT32_MAX nanoseconds ahead, we still use INT32_MAX
* nanoseconds.
*/
if ((deadline < 0) || (deadline > INT32_MAX)) {
deadline = INT32_MAX;
}
count = qemu_icount_round(deadline);
qemu_icount += count;
decr = (count > 0xffff) ? 0xffff : count;
count -= decr;
@ -1175,17 +1271,17 @@ static void tcg_exec_all(void)
{
int r;
/* Account partial waits to the vm_clock. */
qemu_clock_warp(vm_clock);
/* Account partial waits to QEMU_CLOCK_VIRTUAL. */
qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
if (next_cpu == NULL) {
next_cpu = first_cpu;
}
for (; next_cpu != NULL && !exit_request; next_cpu = next_cpu->next_cpu) {
for (; next_cpu != NULL && !exit_request; next_cpu = CPU_NEXT(next_cpu)) {
CPUState *cpu = next_cpu;
CPUArchState *env = cpu->env_ptr;
qemu_clock_enable(vm_clock,
qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
(cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
if (cpu_can_run(cpu)) {
@ -1206,7 +1302,7 @@ void set_numa_modes(void)
CPUState *cpu;
int i;
for (cpu = first_cpu; cpu != NULL; cpu = cpu->next_cpu) {
CPU_FOREACH(cpu) {
for (i = 0; i < nb_numa_nodes; i++) {
if (test_bit(cpu->cpu_index, node_cpumask[i])) {
cpu->numa_node = i;
@ -1228,7 +1324,7 @@ CpuInfoList *qmp_query_cpus(Error **errp)
CpuInfoList *head = NULL, *cur_item = NULL;
CPUState *cpu;
for (cpu = first_cpu; cpu != NULL; cpu = cpu->next_cpu) {
CPU_FOREACH(cpu) {
CpuInfoList *info;
#if defined(TARGET_I386)
X86CPU *x86_cpu = X86_CPU(cpu);
@ -1309,7 +1405,10 @@ void qmp_memsave(int64_t addr, int64_t size, const char *filename,
l = sizeof(buf);
if (l > size)
l = size;
cpu_memory_rw_debug(cpu, addr, buf, l, 0);
if (cpu_memory_rw_debug(cpu, addr, buf, l, 0) != 0) {
error_setg(errp, "Invalid addr 0x%016" PRIx64 "specified", addr);
goto exit;
}
if (fwrite(buf, 1, l, f) != l) {
error_set(errp, QERR_IO_ERROR);
goto exit;
@ -1357,7 +1456,7 @@ void qmp_inject_nmi(Error **errp)
#if defined(TARGET_I386)
CPUState *cs;
for (cs = first_cpu; cs != NULL; cs = cs->next_cpu) {
CPU_FOREACH(cs) {
X86CPU *cpu = X86_CPU(cs);
CPUX86State *env = &cpu->env;
@ -1367,6 +1466,20 @@ void qmp_inject_nmi(Error **errp)
apic_deliver_nmi(env->apic_state);
}
}
#elif defined(TARGET_S390X)
CPUState *cs;
S390CPU *cpu;
CPU_FOREACH(cs) {
cpu = S390_CPU(cs);
if (cpu->env.cpu_num == monitor_get_cpu_index()) {
if (s390_cpu_restart(S390_CPU(cs)) == -1) {
error_set(errp, QERR_UNSUPPORTED);
return;
}
break;
}
}
#else
error_set(errp, QERR_UNSUPPORTED);
#endif

View File

@ -169,27 +169,12 @@ static inline ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr)
return ram_addr;
}
static inline void tlb_update_dirty(CPUTLBEntry *tlb_entry)
{
ram_addr_t ram_addr;
void *p;
if (tlb_is_dirty_ram(tlb_entry)) {
p = (void *)(uintptr_t)((tlb_entry->addr_write & TARGET_PAGE_MASK)
+ tlb_entry->addend);
ram_addr = qemu_ram_addr_from_host_nofail(p);
if (!cpu_physical_memory_is_dirty(ram_addr)) {
tlb_entry->addr_write |= TLB_NOTDIRTY;
}
}
}
void cpu_tlb_reset_dirty_all(ram_addr_t start1, ram_addr_t length)
{
CPUState *cpu;
CPUArchState *env;
for (cpu = first_cpu; cpu != NULL; cpu = cpu->next_cpu) {
CPU_FOREACH(cpu) {
int mmu_idx;
env = cpu->env_ptr;

View File

@ -1,3 +1 @@
# Default configuration for arm-linux-user
CONFIG_GDBSTUB_XML=y

View File

@ -2,7 +2,6 @@
include pci.mak
include usb.mak
CONFIG_GDBSTUB_XML=y
CONFIG_VGA=y
CONFIG_ISA_MMIO=y
CONFIG_NAND=y
@ -34,9 +33,9 @@ CONFIG_PFLASH_CFI02=y
CONFIG_MICRODRIVE=y
CONFIG_USB_MUSB=y
CONFIG_ARM9MPCORE=y
CONFIG_ARM11MPCORE=y
CONFIG_ARM15MPCORE=y
CONFIG_A9MPCORE=y
CONFIG_A15MPCORE=y
CONFIG_ARM_GIC=y
CONFIG_ARM_GIC_KVM=$(CONFIG_KVM)
@ -62,6 +61,7 @@ CONFIG_BITBANG_I2C=y
CONFIG_FRAMEBUFFER=y
CONFIG_XILINX_SPIPS=y
CONFIG_ARM11SCU=y
CONFIG_A9SCU=y
CONFIG_MARVELL_88W8618=y
CONFIG_OMAP=y
@ -80,3 +80,4 @@ CONFIG_VERSATILE_PCI=y
CONFIG_VERSATILE_I2C=y
CONFIG_SDHCI=y
CONFIG_INTEGRATOR_DEBUG=y

View File

@ -1,3 +1 @@
# Default configuration for armeb-linux-user
CONFIG_GDBSTUB_XML=y

View File

@ -1,3 +1 @@
# Default configuration for m68k-linux-user
CONFIG_GDBSTUB_XML=y

View File

@ -3,5 +3,4 @@
include pci.mak
include usb.mak
CONFIG_COLDFIRE=y
CONFIG_GDBSTUB_XML=y
CONFIG_PTIMER=y

View File

@ -1,3 +1 @@
# Default configuration for ppc-linux-user
CONFIG_GDBSTUB_XML=y

View File

@ -3,7 +3,6 @@
include pci.mak
include sound.mak
include usb.mak
CONFIG_GDBSTUB_XML=y
CONFIG_ISA_MMIO=y
CONFIG_ESCC=y
CONFIG_M48T59=y

View File

@ -1,3 +1 @@
# Default configuration for ppc64-linux-user
CONFIG_GDBSTUB_XML=y

View File

@ -3,7 +3,6 @@
include pci.mak
include sound.mak
include usb.mak
CONFIG_GDBSTUB_XML=y
CONFIG_ISA_MMIO=y
CONFIG_ESCC=y
CONFIG_M48T59=y
@ -47,6 +46,7 @@ CONFIG_E500=y
CONFIG_OPENPIC_KVM=$(and $(CONFIG_E500),$(CONFIG_KVM))
# For pSeries
CONFIG_XICS=$(CONFIG_PSERIES)
CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
# For PReP
CONFIG_I82378=y
CONFIG_I8259=y

View File

@ -1,3 +1 @@
# Default configuration for ppc64abi32-linux-user
CONFIG_GDBSTUB_XML=y

View File

@ -3,7 +3,6 @@
include pci.mak
include sound.mak
include usb.mak
CONFIG_GDBSTUB_XML=y
CONFIG_ISA_MMIO=y
CONFIG_ESCC=y
CONFIG_M48T59=y

47
disas.c
View File

@ -158,6 +158,35 @@ print_insn_thumb1(bfd_vma pc, disassemble_info *info)
}
#endif
static int print_insn_objdump(bfd_vma pc, disassemble_info *info,
const char *prefix)
{
int i, n = info->buffer_length;
uint8_t *buf = g_malloc(n);
info->read_memory_func(pc, buf, n, info);
for (i = 0; i < n; ++i) {
if (i % 32 == 0) {
info->fprintf_func(info->stream, "\n%s: ", prefix);
}
info->fprintf_func(info->stream, "%02x", buf[i]);
}
g_free(buf);
return n;
}
static int print_insn_od_host(bfd_vma pc, disassemble_info *info)
{
return print_insn_objdump(pc, info, "OBJD-H");
}
static int print_insn_od_target(bfd_vma pc, disassemble_info *info)
{
return print_insn_objdump(pc, info, "OBJD-T");
}
/* Disassemble this for me please... (debugging). 'flags' has the following
values:
i386 - 1 means 16 bit code, 2 means 64 bit code
@ -171,7 +200,7 @@ void target_disas(FILE *out, CPUArchState *env, target_ulong code,
target_ulong pc;
int count;
CPUDebug s;
int (*print_insn)(bfd_vma pc, disassemble_info *info);
int (*print_insn)(bfd_vma pc, disassemble_info *info) = NULL;
INIT_DISASSEMBLE_INFO(s.info, out, fprintf);
@ -263,11 +292,10 @@ void target_disas(FILE *out, CPUArchState *env, target_ulong code,
#elif defined(TARGET_LM32)
s.info.mach = bfd_mach_lm32;
print_insn = print_insn_lm32;
#else
fprintf(out, "0x" TARGET_FMT_lx
": Asm output not supported on this arch\n", code);
return;
#endif
if (print_insn == NULL) {
print_insn = print_insn_od_target;
}
for (pc = code; size > 0; pc += count, size -= count) {
fprintf(out, "0x" TARGET_FMT_lx ": ", pc);
@ -303,7 +331,7 @@ void disas(FILE *out, void *code, unsigned long size)
uintptr_t pc;
int count;
CPUDebug s;
int (*print_insn)(bfd_vma pc, disassemble_info *info);
int (*print_insn)(bfd_vma pc, disassemble_info *info) = NULL;
INIT_DISASSEMBLE_INFO(s.info, out, fprintf);
s.info.print_address_func = generic_print_host_address;
@ -347,11 +375,10 @@ void disas(FILE *out, void *code, unsigned long size)
print_insn = print_insn_hppa;
#elif defined(__ia64__)
print_insn = print_insn_ia64;
#else
fprintf(out, "0x%lx: Asm output not supported on this arch\n",
(long) code);
return;
#endif
if (print_insn == NULL) {
print_insn = print_insn_od_host;
}
for (pc = (uintptr_t)code; size > 0; pc += count, size -= count) {
fprintf(out, "0x%08" PRIxPTR ": ", pc);
count = print_insn(pc, &s.info);

View File

@ -5157,7 +5157,8 @@ int
print_insn_ppc (bfd_vma memaddr, struct disassemble_info *info)
{
int dialect = (char *) info->private_data - (char *) 0;
return print_insn_powerpc (memaddr, info, 1, dialect);
return print_insn_powerpc (memaddr, info, info->endian == BFD_ENDIAN_BIG,
dialect);
}
/* Print a big endian PowerPC instruction. */

View File

@ -11,6 +11,7 @@
#include "trace.h"
#include "qemu/range.h"
#include "qemu/thread.h"
#include "qemu/main-loop.h"
/* #define DEBUG_IOMMU */

View File

@ -52,7 +52,7 @@ Configuring and building:
Assuming you have a working smartcard on the host with the current
user, using NSS, qemu acts as another NSS client using ccid-card-emulated:
qemu -usb -device usb-ccid -device ccid-card-emualated
qemu -usb -device usb-ccid -device ccid-card-emulated
4. Using ccid-card-emulated with certificates

View File

@ -52,6 +52,15 @@ MemoryRegion):
hole". Aliases may point to any type of region, including other aliases,
but an alias may not point back to itself, directly or indirectly.
It is valid to add subregions to a region which is not a pure container
(that is, to an MMIO, RAM or ROM region). This means that the region
will act like a container, except that any addresses within the container's
region which are not claimed by any subregion are handled by the
container itself (ie by its MMIO callbacks or RAM backing). However
it is generally possible to achieve the same effect with a pure container
one of whose subregions is a low priority "background" region covering
the whole address range; this is often clearer and is preferred.
Subregions cannot be added to an alias region.
Region names
------------
@ -80,6 +89,53 @@ guest. This is done with memory_region_add_subregion_overlap(), which
allows the region to overlap any other region in the same container, and
specifies a priority that allows the core to decide which of two regions at
the same address are visible (highest wins).
Priority values are signed, and the default value is zero. This means that
you can use memory_region_add_subregion_overlap() both to specify a region
that must sit 'above' any others (with a positive priority) and also a
background region that sits 'below' others (with a negative priority).
If the higher priority region in an overlap is a container or alias, then
the lower priority region will appear in any "holes" that the higher priority
region has left by not mapping subregions to that area of its address range.
(This applies recursively -- if the subregions are themselves containers or
aliases that leave holes then the lower priority region will appear in these
holes too.)
For example, suppose we have a container A of size 0x8000 with two subregions
B and C. B is a container mapped at 0x2000, size 0x4000, priority 1; C is
an MMIO region mapped at 0x0, size 0x6000, priority 2. B currently has two
of its own subregions: D of size 0x1000 at offset 0 and E of size 0x1000 at
offset 0x2000. As a diagram:
0 1000 2000 3000 4000 5000 6000 7000 8000
|------|------|------|------|------|------|------|-------|
A: [ ]
C: [CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC]
B: [ ]
D: [DDDDD]
E: [EEEEE]
The regions that will be seen within this address range then are:
[CCCCCCCCCCCC][DDDDD][CCCCC][EEEEE][CCCCC]
Since B has higher priority than C, its subregions appear in the flat map
even where they overlap with C. In ranges where B has not mapped anything
C's region appears.
If B had provided its own MMIO operations (ie it was not a pure container)
then these would be used for any addresses in its range not handled by
D or E, and the result would be:
[CCCCCCCCCCCC][DDDDD][BBBBB][EEEEE][BBBBB]
Priority values are local to a container, because the priorities of two
regions are only compared when they are both children of the same container.
This means that the device in charge of the container (typically modelling
a bus or a memory controller) can use them to manage the interaction of
its child regions without any side effects on other parts of the system.
In the example above, the priorities of D and E are unimportant because
they do not overlap each other. It is the relative priority of B and C
that causes D and E to appear on top of C: D and E's priorities are never
compared against the priority of C.
Visibility
----------
@ -90,11 +146,19 @@ guest accesses an address:
descending priority order
- if the address lies outside the region offset/size, the subregion is
discarded
- if the subregion is a leaf (RAM or MMIO), the search terminates
- if the subregion is a leaf (RAM or MMIO), the search terminates, returning
this leaf region
- if the subregion is a container, the same algorithm is used within the
subregion (after the address is adjusted by the subregion offset)
- if the subregion is an alias, the search is continues at the alias target
- if the subregion is an alias, the search is continued at the alias target
(after the address is adjusted by the subregion offset and alias offset)
- if a recursive search within a container or alias subregion does not
find a match (because of a "hole" in the container's coverage of its
address range), then if this is a container with its own MMIO or RAM
backing the search terminates, returning the container itself. Otherwise
we continue with the next subregion in priority order
- if none of the subregions match the address then the search terminates
with no match found
Example memory map
------------------

View File

@ -91,6 +91,29 @@
port = "4"
chassis = "4"
##
# Example PCIe switch with two downstream ports
#
#[device "pcie-switch-upstream-port-1"]
# driver = "x3130-upstream"
# bus = "ich9-pcie-port-4"
# addr = "00.0"
#
#[device "pcie-switch-downstream-port-1-1"]
# driver = "xio3130-downstream"
# multifunction = "on"
# bus = "pcie-switch-upstream-port-1"
# addr = "00.0"
# port = "1"
# chassis = "5"
#
#[device "pcie-switch-downstream-port-1-2"]
# driver = "xio3130-downstream"
# multifunction = "on"
# bus = "pcie-switch-upstream-port-1"
# addr = "00.1"
# port = "1"
# chassis = "6"
[device "ich9-ehci-1"]
driver = "ich9-usb-ehci1"

View File

@ -53,6 +53,23 @@ The use of '*' as a prefix to the name means the member is optional. Optional
members should always be added to the end of the dictionary to preserve
backwards compatibility.
A complex type definition can specify another complex type as its base.
In this case, the fields of the base type are included as top-level fields
of the new complex type's dictionary in the QMP wire format. An example
definition is:
{ 'type': 'BlockdevOptionsGenericFormat', 'data': { 'file': 'str' } }
{ 'type': 'BlockdevOptionsGenericCOWFormat',
'base': 'BlockdevOptionsGenericFormat',
'data': { '*backing': 'str' } }
An example BlockdevOptionsGenericCOWFormat object on the wire could use
both fields like this:
{ "file": "/some/place/my-image",
"backing": "/some/place/my-backing-file" }
=== Enumeration types ===
An enumeration type is a dictionary containing a single key whose value is a
@ -147,7 +164,7 @@ This example allows using both of the following example objects:
{ "file": "my_existing_block_device_id" }
{ "file": { "driver": "file",
"readonly": false,
'filename': "/tmp/mydisk.qcow2" } }
"filename": "/tmp/mydisk.qcow2" } }
=== Commands ===

87
docs/qmp/README Normal file
View File

@ -0,0 +1,87 @@
QEMU Machine Protocol
=====================
Introduction
------------
The QEMU Machine Protocol (QMP) allows applications to operate a
QEMU instance.
QMP is JSON[1] based and features the following:
- Lightweight, text-based, easy to parse data format
- Asynchronous messages support (ie. events)
- Capabilities Negotiation
For detailed information on QMP's usage, please, refer to the following files:
o qmp-spec.txt QEMU Machine Protocol current specification
o qmp-commands.txt QMP supported commands (auto-generated at build-time)
o qmp-events.txt List of available asynchronous events
[1] http://www.json.org
Usage
-----
You can use the -qmp option to enable QMP. For example, the following
makes QMP available on localhost port 4444:
$ qemu [...] -qmp tcp:localhost:4444,server,nowait
However, for more flexibility and to make use of more options, the -mon
command-line option should be used. For instance, the following example
creates one HMP instance (human monitor) on stdio and one QMP instance
on localhost port 4444:
$ qemu [...] -chardev stdio,id=mon0 -mon chardev=mon0,mode=readline \
-chardev socket,id=mon1,host=localhost,port=4444,server,nowait \
-mon chardev=mon1,mode=control,pretty=on
Please, refer to QEMU's manpage for more information.
Simple Testing
--------------
To manually test QMP one can connect with telnet and issue commands by hand:
$ telnet localhost 4444
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
{
"QMP": {
"version": {
"qemu": {
"micro": 50,
"minor": 6,
"major": 1
},
"package": ""
},
"capabilities": [
]
}
}
{ "execute": "qmp_capabilities" }
{
"return": {
}
}
{ "execute": "query-status" }
{
"return": {
"status": "prelaunch",
"singlestep": false,
"running": false
}
}
Please, refer to the qapi-schema.json file for a complete command reference.
QMP wiki page
-------------
http://wiki.qemu-project.org/QMP

View File

@ -1,4 +1,4 @@
QEMU Monitor Protocol Events
QEMU Machine Protocol Events
============================
BALLOON_CHANGE
@ -18,6 +18,28 @@ Example:
"data": { "actual": 944766976 },
"timestamp": { "seconds": 1267020223, "microseconds": 435656 } }
BLOCK_IMAGE_CORRUPTED
---------------------
Emitted when a disk image is being marked corrupt.
Data:
- "device": Device name (json-string)
- "msg": Informative message (e.g., reason for the corruption) (json-string)
- "offset": If the corruption resulted from an image access, this is the access
offset into the image (json-int)
- "size": If the corruption resulted from an image access, this is the access
size (json-int)
Example:
{ "event": "BLOCK_IMAGE_CORRUPTED",
"data": { "device": "ide0-hd0",
"msg": "Prevented active L1 table overwrite", "offset": 196608,
"size": 65536 },
"timestamp": { "seconds": 1378126126, "microseconds": 966463 } }
BLOCK_IO_ERROR
--------------
@ -137,7 +159,7 @@ Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
event.
DEVICE_DELETED
-----------------
--------------
Emitted whenever the device removal completion is acknowledged
by the guest.
@ -172,8 +194,22 @@ Data:
},
"timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
GUEST_PANICKED
--------------
Emitted when guest OS panic is detected.
Data:
- "action": Action that has been taken (json-string, currently always "pause").
Example:
{ "event": "GUEST_PANICKED",
"data": { "action": "pause" } }
NIC_RX_FILTER_CHANGED
-----------------
---------------------
The event is emitted once until the query command is executed,
the first event will always be emitted.
@ -464,17 +500,3 @@ Example:
Note: If action is "reset", "shutdown", or "pause" the WATCHDOG event is
followed respectively by the RESET, SHUTDOWN, or STOP events.
GUEST_PANICKED
--------------
Emitted when guest OS panic is detected.
Data:
- "action": Action that has been taken (json-string, currently always "pause").
Example:
{ "event": "GUEST_PANICKED",
"data": { "action": "pause" } }

View File

@ -1,21 +1,17 @@
QEMU Monitor Protocol Specification - Version 0.1
QEMU Machine Protocol Specification
1. Introduction
===============
This document specifies the QEMU Monitor Protocol (QMP), a JSON-based protocol
which is available for applications to control QEMU at the machine-level.
To enable QMP support, QEMU has to be run in "control mode". This is done by
starting QEMU with the appropriate command-line options. Please, refer to the
QEMU manual page for more information.
This document specifies the QEMU Machine Protocol (QMP), a JSON-based protocol
which is available for applications to operate QEMU at the machine-level.
2. Protocol Specification
=========================
This section details the protocol format. For the purpose of this document
"Client" is any application which is communicating with QEMU in control mode,
and "Server" is QEMU itself.
"Client" is any application which is using QMP to communicate with QEMU and
"Server" is QEMU itself.
JSON data structures, when mentioned in this document, are always in the
following format:
@ -47,14 +43,14 @@ that the connection has been successfully established and that the Server is
ready for capabilities negotiation (for more information refer to section
'4. Capabilities Negotiation').
The format is:
The greeting message format is:
{ "QMP": { "version": json-object, "capabilities": json-array } }
Where,
- The "version" member contains the Server's version information (the format
is the same of the 'query-version' command)
is the same of the query-version command)
- The "capabilities" member specify the availability of features beyond the
baseline specification
@ -83,10 +79,7 @@ of a command execution: success or error.
2.4.1 success
-------------
The success response is issued when the command execution has finished
without errors.
The format is:
The format of a success response is:
{ "return": json-object, "id": json-value }
@ -96,15 +89,12 @@ The format is:
in a per-command basis or an empty json-object if the command does not
return data
- The "id" member contains the transaction identification associated
with the command execution (if issued by the Client)
with the command execution if issued by the Client
2.4.2 error
-----------
The error response is issued when the command execution could not be
completed because of an error condition.
The format is:
The format of an error response is:
{ "error": { "class": json-string, "desc": json-string }, "id": json-value }
@ -114,7 +104,7 @@ The format is:
- The "desc" member is a human-readable error message. Clients should
not attempt to parse this message.
- The "id" member contains the transaction identification associated with
the command execution (if issued by the Client)
the command execution if issued by the Client
NOTE: Some errors can occur before the Server is able to read the "id" member,
in these cases the "id" member will not be part of the error response, even
@ -124,9 +114,9 @@ if provided by the client.
-----------------------
As a result of state changes, the Server may send messages unilaterally
to the Client at any time. They are called 'asynchronous events'.
to the Client at any time. They are called "asynchronous events".
The format is:
The format of asynchronous events is:
{ "event": json-string, "data": json-object,
"timestamp": { "seconds": json-number, "microseconds": json-number } }
@ -147,36 +137,37 @@ qmp-events.txt file.
===============
This section provides some examples of real QMP usage, in all of them
'C' stands for 'Client' and 'S' stands for 'Server'.
"C" stands for "Client" and "S" stands for "Server".
3.1 Server greeting
-------------------
S: {"QMP": {"version": {"qemu": "0.12.50", "package": ""}, "capabilities": []}}
S: { "QMP": { "version": { "qemu": { "micro": 50, "minor": 6, "major": 1 },
"package": ""}, "capabilities": []}}
3.2 Simple 'stop' execution
---------------------------
C: { "execute": "stop" }
S: {"return": {}}
S: { "return": {} }
3.3 KVM information
-------------------
C: { "execute": "query-kvm", "id": "example" }
S: {"return": {"enabled": true, "present": true}, "id": "example"}
S: { "return": { "enabled": true, "present": true }, "id": "example"}
3.4 Parsing error
------------------
C: { "execute": }
S: {"error": {"class": "GenericError", "desc": "Invalid JSON syntax" } }
S: { "error": { "class": "GenericError", "desc": "Invalid JSON syntax" } }
3.5 Powerdown event
-------------------
S: {"timestamp": {"seconds": 1258551470, "microseconds": 802384}, "event":
"POWERDOWN"}
S: { "timestamp": { "seconds": 1258551470, "microseconds": 802384 },
"event": "POWERDOWN" }
4. Capabilities Negotiation
----------------------------
@ -184,17 +175,17 @@ S: {"timestamp": {"seconds": 1258551470, "microseconds": 802384}, "event":
When a Client successfully establishes a connection, the Server is in
Capabilities Negotiation mode.
In this mode only the 'qmp_capabilities' command is allowed to run, all
other commands will return the CommandNotFound error. Asynchronous messages
are not delivered either.
In this mode only the qmp_capabilities command is allowed to run, all
other commands will return the CommandNotFound error. Asynchronous
messages are not delivered either.
Clients should use the 'qmp_capabilities' command to enable capabilities
Clients should use the qmp_capabilities command to enable capabilities
advertised in the Server's greeting (section '2.2 Server Greeting') they
support.
When the 'qmp_capabilities' command is issued, and if it does not return an
When the qmp_capabilities command is issued, and if it does not return an
error, the Server enters in Command mode where capabilities changes take
effect, all commands (except 'qmp_capabilities') are allowed and asynchronous
effect, all commands (except qmp_capabilities) are allowed and asynchronous
messages are delivered.
5 Compatibility Considerations
@ -245,7 +236,7 @@ arguments, errors, asynchronous events, and so forth.
Any new names downstream wishes to add must begin with '__'. To
ensure compatibility with other downstreams, it is strongly
recommended that you prefix your downstram names with '__RFQDN_' where
recommended that you prefix your downstream names with '__RFQDN_' where
RFQDN is a valid, reverse fully qualified domain name which you
control. For example, a qemu-kvm specific monitor command would be:

View File

@ -1,7 +1,7 @@
(RDMA: Remote Direct Memory Access)
RDMA Live Migration Specification, Version # 1
==============================================
Wiki: http://wiki.qemu.org/Features/RDMALiveMigration
Wiki: http://wiki.qemu-project.org/Features/RDMALiveMigration
Github: git@github.com:hinesmr/qemu.git, 'rdma' branch
Copyright (C) 2013 Michael R. Hines <mrhines@us.ibm.com>

Some files were not shown because too many files have changed in this diff Show More