mirror of https://github.com/xemu-project/xemu.git
vfio queue:
* Small downtime optimisation for VFIO migration * P2P support for VFIO migration * Introduction of a save_prepare() handler to fail VFIO migration * Fix on DMA logging ranges calculation for OVMF enabling dynamic window -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmT+uZQACgkQUaNDx8/7 7KGFSw//UIqSet6MUxZZh/t7yfNFUTnxx6iPdChC3BphBaDDh99FCQrw5mPZ8ImF 4rz0cIwSaHXraugEsC42TDaGjEmcAmYD0Crz+pSpLU21nKtYyWtZy6+9kyYslMNF bUq0UwD0RGTP+ZZi6GBy1hM30y/JbNAGeC6uX8kyJRuK5Korfzoa/X5h+B2XfouW 78G1mARHq5eOkGy91+rAJowdjqtkpKrzkfCJu83330Bb035qAT/PEzGs5LxdfTla ORNqWHy3W+d8ZBicBQ5vwrk6D5JIZWma7vdXJRhs1wGO615cuyt1L8nWLFr8klW5 MJl+wM7DZ6UlSODq7r839GtSuWAnQc2j7JKc+iqZuBBk1v9fGXv2tZmtuTGkG2hN nYXSQfuq1igu1nGVdxJv6WorDxsK9wzLNO2ckrOcKTT28RFl8oCDNSPPTKpwmfb5 i5RrGreeXXqRXIw0VHhq5EqpROLjAFwE9tkJndO8765Ag154plxssaKTUWo5wm7/ kjQVuRuhs5nnMXfL9ixLZkwD1aFn5fWAIaR0psH5vGD0fnB1Pba+Ux9ZzHvxp5D8 Kg3H6dKlht6VXdQ/qb0Up1LXCGEa70QM6Th2iO924ydZkkmqrSj+CFwGHvBsINa4 89fYd77nbRbdwWurj3JIznJYVipau2PmfbjZ/jTed4RxjBQ+fPA= =44e0 -----END PGP SIGNATURE----- Merge tag 'pull-vfio-20230911' of https://github.com/legoater/qemu into staging vfio queue: * Small downtime optimisation for VFIO migration * P2P support for VFIO migration * Introduction of a save_prepare() handler to fail VFIO migration * Fix on DMA logging ranges calculation for OVMF enabling dynamic window # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmT+uZQACgkQUaNDx8/7 # 7KGFSw//UIqSet6MUxZZh/t7yfNFUTnxx6iPdChC3BphBaDDh99FCQrw5mPZ8ImF # 4rz0cIwSaHXraugEsC42TDaGjEmcAmYD0Crz+pSpLU21nKtYyWtZy6+9kyYslMNF # bUq0UwD0RGTP+ZZi6GBy1hM30y/JbNAGeC6uX8kyJRuK5Korfzoa/X5h+B2XfouW # 78G1mARHq5eOkGy91+rAJowdjqtkpKrzkfCJu83330Bb035qAT/PEzGs5LxdfTla # ORNqWHy3W+d8ZBicBQ5vwrk6D5JIZWma7vdXJRhs1wGO615cuyt1L8nWLFr8klW5 # MJl+wM7DZ6UlSODq7r839GtSuWAnQc2j7JKc+iqZuBBk1v9fGXv2tZmtuTGkG2hN # nYXSQfuq1igu1nGVdxJv6WorDxsK9wzLNO2ckrOcKTT28RFl8oCDNSPPTKpwmfb5 # i5RrGreeXXqRXIw0VHhq5EqpROLjAFwE9tkJndO8765Ag154plxssaKTUWo5wm7/ # kjQVuRuhs5nnMXfL9ixLZkwD1aFn5fWAIaR0psH5vGD0fnB1Pba+Ux9ZzHvxp5D8 # Kg3H6dKlht6VXdQ/qb0Up1LXCGEa70QM6Th2iO924ydZkkmqrSj+CFwGHvBsINa4 # 89fYd77nbRbdwWurj3JIznJYVipau2PmfbjZ/jTed4RxjBQ+fPA= # =44e0 # -----END PGP SIGNATURE----- # gpg: Signature made Mon 11 Sep 2023 02:54:12 EDT # gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1 # gpg: Good signature from "Cédric Le Goater <clg@redhat.com>" [unknown] # gpg: aka "Cédric Le Goater <clg@kaod.org>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1 * tag 'pull-vfio-20230911' of https://github.com/legoater/qemu: vfio/common: Separate vfio-pci ranges vfio/migration: Block VFIO migration with background snapshot vfio/migration: Block VFIO migration with postcopy migration migration: Add .save_prepare() handler to struct SaveVMHandlers migration: Move more initializations to migrate_init() vfio/migration: Fail adding device with enable-migration=on and existing blocker migration: Add migration prefix to functions in target.c vfio/migration: Allow migration of multiple P2P supporting devices vfio/migration: Add P2P support for VFIO migration vfio/migration: Refactor PRE_COPY and RUNNING state checks qdev: Add qdev_add_vm_change_state_handler_full() sysemu: Add prepare callback to struct VMChangeStateEntry vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup() Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This commit is contained in:
commit
9ef497755a
|
@ -23,9 +23,21 @@ and recommends that the initial bytes are sent and loaded in the destination
|
||||||
before stopping the source VM. Enabling this migration capability will
|
before stopping the source VM. Enabling this migration capability will
|
||||||
guarantee that and thus, can potentially reduce downtime even further.
|
guarantee that and thus, can potentially reduce downtime even further.
|
||||||
|
|
||||||
Note that currently VFIO migration is supported only for a single device. This
|
To support migration of multiple devices that might do P2P transactions between
|
||||||
is due to VFIO migration's lack of P2P support. However, P2P support is planned
|
themselves, VFIO migration uAPI defines an intermediate P2P quiescent state.
|
||||||
to be added later on.
|
While in the P2P quiescent state, P2P DMA transactions cannot be initiated by
|
||||||
|
the device, but the device can respond to incoming ones. Additionally, all
|
||||||
|
outstanding P2P transactions are guaranteed to have been completed by the time
|
||||||
|
the device enters this state.
|
||||||
|
|
||||||
|
All the devices that support P2P migration are first transitioned to the P2P
|
||||||
|
quiescent state and only then are they stopped or started. This makes migration
|
||||||
|
safe P2P-wise, since starting and stopping the devices is not done atomically
|
||||||
|
for all the devices together.
|
||||||
|
|
||||||
|
Thus, multiple VFIO devices migration is allowed only if all the devices
|
||||||
|
support P2P migration. Single VFIO device migration is allowed regardless of
|
||||||
|
P2P migration support.
|
||||||
|
|
||||||
A detailed description of the UAPI for VFIO device migration can be found in
|
A detailed description of the UAPI for VFIO device migration can be found in
|
||||||
the comment for the ``vfio_device_mig_state`` structure in the header file
|
the comment for the ``vfio_device_mig_state`` structure in the header file
|
||||||
|
@ -132,54 +144,63 @@ will be blocked.
|
||||||
Flow of state changes during Live migration
|
Flow of state changes during Live migration
|
||||||
===========================================
|
===========================================
|
||||||
|
|
||||||
Below is the flow of state change during live migration.
|
Below is the state change flow during live migration for a VFIO device that
|
||||||
|
supports both precopy and P2P migration. The flow for devices that don't
|
||||||
|
support it is similar, except that the relevant states for precopy and P2P are
|
||||||
|
skipped.
|
||||||
The values in the parentheses represent the VM state, the migration state, and
|
The values in the parentheses represent the VM state, the migration state, and
|
||||||
the VFIO device state, respectively.
|
the VFIO device state, respectively.
|
||||||
The text in the square brackets represents the flow if the VFIO device supports
|
|
||||||
pre-copy.
|
|
||||||
|
|
||||||
Live migration save path
|
Live migration save path
|
||||||
------------------------
|
------------------------
|
||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
QEMU normal running state
|
QEMU normal running state
|
||||||
(RUNNING, _NONE, _RUNNING)
|
(RUNNING, _NONE, _RUNNING)
|
||||||
|
|
|
|
||||||
migrate_init spawns migration_thread
|
migrate_init spawns migration_thread
|
||||||
Migration thread then calls each device's .save_setup()
|
Migration thread then calls each device's .save_setup()
|
||||||
(RUNNING, _SETUP, _RUNNING [_PRE_COPY])
|
(RUNNING, _SETUP, _PRE_COPY)
|
||||||
|
|
|
|
||||||
(RUNNING, _ACTIVE, _RUNNING [_PRE_COPY])
|
(RUNNING, _ACTIVE, _PRE_COPY)
|
||||||
If device is active, get pending_bytes by .state_pending_{estimate,exact}()
|
If device is active, get pending_bytes by .state_pending_{estimate,exact}()
|
||||||
If total pending_bytes >= threshold_size, call .save_live_iterate()
|
If total pending_bytes >= threshold_size, call .save_live_iterate()
|
||||||
[Data of VFIO device for pre-copy phase is copied]
|
Data of VFIO device for pre-copy phase is copied
|
||||||
Iterate till total pending bytes converge and are less than threshold
|
Iterate till total pending bytes converge and are less than threshold
|
||||||
|
|
|
|
||||||
On migration completion, vCPU stops and calls .save_live_complete_precopy for
|
On migration completion, the vCPUs and the VFIO device are stopped
|
||||||
each active device. The VFIO device is then transitioned into _STOP_COPY state
|
The VFIO device is first put in P2P quiescent state
|
||||||
(FINISH_MIGRATE, _DEVICE, _STOP_COPY)
|
(FINISH_MIGRATE, _ACTIVE, _PRE_COPY_P2P)
|
||||||
|
|
|
|
||||||
For the VFIO device, iterate in .save_live_complete_precopy until
|
Then the VFIO device is put in _STOP_COPY state
|
||||||
pending data is 0
|
(FINISH_MIGRATE, _ACTIVE, _STOP_COPY)
|
||||||
(FINISH_MIGRATE, _DEVICE, _STOP)
|
.save_live_complete_precopy() is called for each active device
|
||||||
|
|
For the VFIO device, iterate in .save_live_complete_precopy() until
|
||||||
(FINISH_MIGRATE, _COMPLETED, _STOP)
|
pending data is 0
|
||||||
Migraton thread schedules cleanup bottom half and exits
|
|
|
||||||
|
(POSTMIGRATE, _COMPLETED, _STOP_COPY)
|
||||||
|
Migraton thread schedules cleanup bottom half and exits
|
||||||
|
|
|
||||||
|
.save_cleanup() is called
|
||||||
|
(POSTMIGRATE, _COMPLETED, _STOP)
|
||||||
|
|
||||||
Live migration resume path
|
Live migration resume path
|
||||||
--------------------------
|
--------------------------
|
||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
Incoming migration calls .load_setup for each device
|
Incoming migration calls .load_setup() for each device
|
||||||
(RESTORE_VM, _ACTIVE, _STOP)
|
(RESTORE_VM, _ACTIVE, _STOP)
|
||||||
|
|
|
|
||||||
For each device, .load_state is called for that device section data
|
For each device, .load_state() is called for that device section data
|
||||||
(RESTORE_VM, _ACTIVE, _RESUMING)
|
(RESTORE_VM, _ACTIVE, _RESUMING)
|
||||||
|
|
|
|
||||||
At the end, .load_cleanup is called for each device and vCPUs are started
|
At the end, .load_cleanup() is called for each device and vCPUs are started
|
||||||
(RUNNING, _NONE, _RUNNING)
|
The VFIO device is first put in P2P quiescent state
|
||||||
|
(RUNNING, _ACTIVE, _RUNNING_P2P)
|
||||||
|
|
|
||||||
|
(RUNNING, _NONE, _RUNNING)
|
||||||
|
|
||||||
Postcopy
|
Postcopy
|
||||||
========
|
========
|
||||||
|
|
|
@ -55,8 +55,20 @@ static int qdev_get_dev_tree_depth(DeviceState *dev)
|
||||||
VMChangeStateEntry *qdev_add_vm_change_state_handler(DeviceState *dev,
|
VMChangeStateEntry *qdev_add_vm_change_state_handler(DeviceState *dev,
|
||||||
VMChangeStateHandler *cb,
|
VMChangeStateHandler *cb,
|
||||||
void *opaque)
|
void *opaque)
|
||||||
|
{
|
||||||
|
return qdev_add_vm_change_state_handler_full(dev, cb, NULL, opaque);
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Exactly like qdev_add_vm_change_state_handler() but passes a prepare_cb
|
||||||
|
* argument too.
|
||||||
|
*/
|
||||||
|
VMChangeStateEntry *qdev_add_vm_change_state_handler_full(
|
||||||
|
DeviceState *dev, VMChangeStateHandler *cb,
|
||||||
|
VMChangeStateHandler *prepare_cb, void *opaque)
|
||||||
{
|
{
|
||||||
int depth = qdev_get_dev_tree_depth(dev);
|
int depth = qdev_get_dev_tree_depth(dev);
|
||||||
|
|
||||||
return qemu_add_vm_change_state_handler_prio(cb, opaque, depth);
|
return qemu_add_vm_change_state_handler_prio_full(cb, prepare_cb, opaque,
|
||||||
|
depth);
|
||||||
}
|
}
|
||||||
|
|
126
hw/vfio/common.c
126
hw/vfio/common.c
|
@ -27,6 +27,7 @@
|
||||||
|
|
||||||
#include "hw/vfio/vfio-common.h"
|
#include "hw/vfio/vfio-common.h"
|
||||||
#include "hw/vfio/vfio.h"
|
#include "hw/vfio/vfio.h"
|
||||||
|
#include "hw/vfio/pci.h"
|
||||||
#include "exec/address-spaces.h"
|
#include "exec/address-spaces.h"
|
||||||
#include "exec/memory.h"
|
#include "exec/memory.h"
|
||||||
#include "exec/ram_addr.h"
|
#include "exec/ram_addr.h"
|
||||||
|
@ -363,41 +364,54 @@ bool vfio_mig_active(void)
|
||||||
|
|
||||||
static Error *multiple_devices_migration_blocker;
|
static Error *multiple_devices_migration_blocker;
|
||||||
|
|
||||||
static unsigned int vfio_migratable_device_num(void)
|
/*
|
||||||
|
* Multiple devices migration is allowed only if all devices support P2P
|
||||||
|
* migration. Single device migration is allowed regardless of P2P migration
|
||||||
|
* support.
|
||||||
|
*/
|
||||||
|
static bool vfio_multiple_devices_migration_is_supported(void)
|
||||||
{
|
{
|
||||||
VFIOGroup *group;
|
VFIOGroup *group;
|
||||||
VFIODevice *vbasedev;
|
VFIODevice *vbasedev;
|
||||||
unsigned int device_num = 0;
|
unsigned int device_num = 0;
|
||||||
|
bool all_support_p2p = true;
|
||||||
|
|
||||||
QLIST_FOREACH(group, &vfio_group_list, next) {
|
QLIST_FOREACH(group, &vfio_group_list, next) {
|
||||||
QLIST_FOREACH(vbasedev, &group->device_list, next) {
|
QLIST_FOREACH(vbasedev, &group->device_list, next) {
|
||||||
if (vbasedev->migration) {
|
if (vbasedev->migration) {
|
||||||
device_num++;
|
device_num++;
|
||||||
|
|
||||||
|
if (!(vbasedev->migration->mig_flags & VFIO_MIGRATION_P2P)) {
|
||||||
|
all_support_p2p = false;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
return device_num;
|
return all_support_p2p || device_num <= 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
|
int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
|
||||||
{
|
{
|
||||||
int ret;
|
int ret;
|
||||||
|
|
||||||
if (multiple_devices_migration_blocker ||
|
if (vfio_multiple_devices_migration_is_supported()) {
|
||||||
vfio_migratable_device_num() <= 1) {
|
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
|
if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
|
||||||
error_setg(errp, "Migration is currently not supported with multiple "
|
error_setg(errp, "Multiple VFIO devices migration is supported only if "
|
||||||
"VFIO devices");
|
"all of them support P2P migration");
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (multiple_devices_migration_blocker) {
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
error_setg(&multiple_devices_migration_blocker,
|
error_setg(&multiple_devices_migration_blocker,
|
||||||
"Migration is currently not supported with multiple "
|
"Multiple VFIO devices migration is supported only if all of "
|
||||||
"VFIO devices");
|
"them support P2P migration");
|
||||||
ret = migrate_add_blocker(multiple_devices_migration_blocker, errp);
|
ret = migrate_add_blocker(multiple_devices_migration_blocker, errp);
|
||||||
if (ret < 0) {
|
if (ret < 0) {
|
||||||
error_free(multiple_devices_migration_blocker);
|
error_free(multiple_devices_migration_blocker);
|
||||||
|
@ -410,7 +424,7 @@ int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
|
||||||
void vfio_unblock_multiple_devices_migration(void)
|
void vfio_unblock_multiple_devices_migration(void)
|
||||||
{
|
{
|
||||||
if (!multiple_devices_migration_blocker ||
|
if (!multiple_devices_migration_blocker ||
|
||||||
vfio_migratable_device_num() > 1) {
|
!vfio_multiple_devices_migration_is_supported()) {
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -437,6 +451,22 @@ static void vfio_set_migration_error(int err)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
bool vfio_device_state_is_running(VFIODevice *vbasedev)
|
||||||
|
{
|
||||||
|
VFIOMigration *migration = vbasedev->migration;
|
||||||
|
|
||||||
|
return migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
|
||||||
|
migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P;
|
||||||
|
}
|
||||||
|
|
||||||
|
bool vfio_device_state_is_precopy(VFIODevice *vbasedev)
|
||||||
|
{
|
||||||
|
VFIOMigration *migration = vbasedev->migration;
|
||||||
|
|
||||||
|
return migration->device_state == VFIO_DEVICE_STATE_PRE_COPY ||
|
||||||
|
migration->device_state == VFIO_DEVICE_STATE_PRE_COPY_P2P;
|
||||||
|
}
|
||||||
|
|
||||||
static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
|
static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
|
||||||
{
|
{
|
||||||
VFIOGroup *group;
|
VFIOGroup *group;
|
||||||
|
@ -457,8 +487,8 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
|
||||||
}
|
}
|
||||||
|
|
||||||
if (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF &&
|
if (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF &&
|
||||||
(migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
|
(vfio_device_state_is_running(vbasedev) ||
|
||||||
migration->device_state == VFIO_DEVICE_STATE_PRE_COPY)) {
|
vfio_device_state_is_precopy(vbasedev))) {
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -503,8 +533,8 @@ static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
|
if (vfio_device_state_is_running(vbasedev) ||
|
||||||
migration->device_state == VFIO_DEVICE_STATE_PRE_COPY) {
|
vfio_device_state_is_precopy(vbasedev)) {
|
||||||
continue;
|
continue;
|
||||||
} else {
|
} else {
|
||||||
return false;
|
return false;
|
||||||
|
@ -1371,6 +1401,8 @@ typedef struct VFIODirtyRanges {
|
||||||
hwaddr max32;
|
hwaddr max32;
|
||||||
hwaddr min64;
|
hwaddr min64;
|
||||||
hwaddr max64;
|
hwaddr max64;
|
||||||
|
hwaddr minpci64;
|
||||||
|
hwaddr maxpci64;
|
||||||
} VFIODirtyRanges;
|
} VFIODirtyRanges;
|
||||||
|
|
||||||
typedef struct VFIODirtyRangesListener {
|
typedef struct VFIODirtyRangesListener {
|
||||||
|
@ -1379,6 +1411,31 @@ typedef struct VFIODirtyRangesListener {
|
||||||
MemoryListener listener;
|
MemoryListener listener;
|
||||||
} VFIODirtyRangesListener;
|
} VFIODirtyRangesListener;
|
||||||
|
|
||||||
|
static bool vfio_section_is_vfio_pci(MemoryRegionSection *section,
|
||||||
|
VFIOContainer *container)
|
||||||
|
{
|
||||||
|
VFIOPCIDevice *pcidev;
|
||||||
|
VFIODevice *vbasedev;
|
||||||
|
VFIOGroup *group;
|
||||||
|
Object *owner;
|
||||||
|
|
||||||
|
owner = memory_region_owner(section->mr);
|
||||||
|
|
||||||
|
QLIST_FOREACH(group, &container->group_list, container_next) {
|
||||||
|
QLIST_FOREACH(vbasedev, &group->device_list, next) {
|
||||||
|
if (vbasedev->type != VFIO_DEVICE_TYPE_PCI) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
pcidev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
|
||||||
|
if (OBJECT(pcidev) == owner) {
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
static void vfio_dirty_tracking_update(MemoryListener *listener,
|
static void vfio_dirty_tracking_update(MemoryListener *listener,
|
||||||
MemoryRegionSection *section)
|
MemoryRegionSection *section)
|
||||||
{
|
{
|
||||||
|
@ -1395,19 +1452,32 @@ static void vfio_dirty_tracking_update(MemoryListener *listener,
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* The address space passed to the dirty tracker is reduced to two ranges:
|
* The address space passed to the dirty tracker is reduced to three ranges:
|
||||||
* one for 32-bit DMA ranges, and another one for 64-bit DMA ranges.
|
* one for 32-bit DMA ranges, one for 64-bit DMA ranges and one for the
|
||||||
|
* PCI 64-bit hole.
|
||||||
|
*
|
||||||
* The underlying reports of dirty will query a sub-interval of each of
|
* The underlying reports of dirty will query a sub-interval of each of
|
||||||
* these ranges.
|
* these ranges.
|
||||||
*
|
*
|
||||||
* The purpose of the dual range handling is to handle known cases of big
|
* The purpose of the three range handling is to handle known cases of big
|
||||||
* holes in the address space, like the x86 AMD 1T hole. The alternative
|
* holes in the address space, like the x86 AMD 1T hole, and firmware (like
|
||||||
* would be an IOVATree but that has a much bigger runtime overhead and
|
* OVMF) which may relocate the pci-hole64 to the end of the address space.
|
||||||
* unnecessary complexity.
|
* The latter would otherwise generate large ranges for tracking, stressing
|
||||||
|
* the limits of supported hardware. The pci-hole32 will always be below 4G
|
||||||
|
* (overlapping or not) so it doesn't need special handling and is part of
|
||||||
|
* the 32-bit range.
|
||||||
|
*
|
||||||
|
* The alternative would be an IOVATree but that has a much bigger runtime
|
||||||
|
* overhead and unnecessary complexity.
|
||||||
*/
|
*/
|
||||||
min = (end <= UINT32_MAX) ? &range->min32 : &range->min64;
|
if (vfio_section_is_vfio_pci(section, dirty->container) &&
|
||||||
max = (end <= UINT32_MAX) ? &range->max32 : &range->max64;
|
iova >= UINT32_MAX) {
|
||||||
|
min = &range->minpci64;
|
||||||
|
max = &range->maxpci64;
|
||||||
|
} else {
|
||||||
|
min = (end <= UINT32_MAX) ? &range->min32 : &range->min64;
|
||||||
|
max = (end <= UINT32_MAX) ? &range->max32 : &range->max64;
|
||||||
|
}
|
||||||
if (*min > iova) {
|
if (*min > iova) {
|
||||||
*min = iova;
|
*min = iova;
|
||||||
}
|
}
|
||||||
|
@ -1432,6 +1502,7 @@ static void vfio_dirty_tracking_init(VFIOContainer *container,
|
||||||
memset(&dirty, 0, sizeof(dirty));
|
memset(&dirty, 0, sizeof(dirty));
|
||||||
dirty.ranges.min32 = UINT32_MAX;
|
dirty.ranges.min32 = UINT32_MAX;
|
||||||
dirty.ranges.min64 = UINT64_MAX;
|
dirty.ranges.min64 = UINT64_MAX;
|
||||||
|
dirty.ranges.minpci64 = UINT64_MAX;
|
||||||
dirty.listener = vfio_dirty_tracking_listener;
|
dirty.listener = vfio_dirty_tracking_listener;
|
||||||
dirty.container = container;
|
dirty.container = container;
|
||||||
|
|
||||||
|
@ -1502,7 +1573,8 @@ vfio_device_feature_dma_logging_start_create(VFIOContainer *container,
|
||||||
* DMA logging uAPI guarantees to support at least a number of ranges that
|
* DMA logging uAPI guarantees to support at least a number of ranges that
|
||||||
* fits into a single host kernel base page.
|
* fits into a single host kernel base page.
|
||||||
*/
|
*/
|
||||||
control->num_ranges = !!tracking->max32 + !!tracking->max64;
|
control->num_ranges = !!tracking->max32 + !!tracking->max64 +
|
||||||
|
!!tracking->maxpci64;
|
||||||
ranges = g_try_new0(struct vfio_device_feature_dma_logging_range,
|
ranges = g_try_new0(struct vfio_device_feature_dma_logging_range,
|
||||||
control->num_ranges);
|
control->num_ranges);
|
||||||
if (!ranges) {
|
if (!ranges) {
|
||||||
|
@ -1521,11 +1593,17 @@ vfio_device_feature_dma_logging_start_create(VFIOContainer *container,
|
||||||
if (tracking->max64) {
|
if (tracking->max64) {
|
||||||
ranges->iova = tracking->min64;
|
ranges->iova = tracking->min64;
|
||||||
ranges->length = (tracking->max64 - tracking->min64) + 1;
|
ranges->length = (tracking->max64 - tracking->min64) + 1;
|
||||||
|
ranges++;
|
||||||
|
}
|
||||||
|
if (tracking->maxpci64) {
|
||||||
|
ranges->iova = tracking->minpci64;
|
||||||
|
ranges->length = (tracking->maxpci64 - tracking->minpci64) + 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
trace_vfio_device_dirty_tracking_start(control->num_ranges,
|
trace_vfio_device_dirty_tracking_start(control->num_ranges,
|
||||||
tracking->min32, tracking->max32,
|
tracking->min32, tracking->max32,
|
||||||
tracking->min64, tracking->max64);
|
tracking->min64, tracking->max64,
|
||||||
|
tracking->minpci64, tracking->maxpci64);
|
||||||
|
|
||||||
return feature;
|
return feature;
|
||||||
}
|
}
|
||||||
|
|
|
@ -71,8 +71,12 @@ static const char *mig_state_to_str(enum vfio_device_mig_state state)
|
||||||
return "STOP_COPY";
|
return "STOP_COPY";
|
||||||
case VFIO_DEVICE_STATE_RESUMING:
|
case VFIO_DEVICE_STATE_RESUMING:
|
||||||
return "RESUMING";
|
return "RESUMING";
|
||||||
|
case VFIO_DEVICE_STATE_RUNNING_P2P:
|
||||||
|
return "RUNNING_P2P";
|
||||||
case VFIO_DEVICE_STATE_PRE_COPY:
|
case VFIO_DEVICE_STATE_PRE_COPY:
|
||||||
return "PRE_COPY";
|
return "PRE_COPY";
|
||||||
|
case VFIO_DEVICE_STATE_PRE_COPY_P2P:
|
||||||
|
return "PRE_COPY_P2P";
|
||||||
default:
|
default:
|
||||||
return "UNKNOWN STATE";
|
return "UNKNOWN STATE";
|
||||||
}
|
}
|
||||||
|
@ -331,6 +335,36 @@ static bool vfio_precopy_supported(VFIODevice *vbasedev)
|
||||||
|
|
||||||
/* ---------------------------------------------------------------------- */
|
/* ---------------------------------------------------------------------- */
|
||||||
|
|
||||||
|
static int vfio_save_prepare(void *opaque, Error **errp)
|
||||||
|
{
|
||||||
|
VFIODevice *vbasedev = opaque;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Snapshot doesn't use postcopy nor background snapshot, so allow snapshot
|
||||||
|
* even if they are on.
|
||||||
|
*/
|
||||||
|
if (runstate_check(RUN_STATE_SAVE_VM)) {
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (migrate_postcopy_ram()) {
|
||||||
|
error_setg(
|
||||||
|
errp, "%s: VFIO migration is not supported with postcopy migration",
|
||||||
|
vbasedev->name);
|
||||||
|
return -EOPNOTSUPP;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (migrate_background_snapshot()) {
|
||||||
|
error_setg(
|
||||||
|
errp,
|
||||||
|
"%s: VFIO migration is not supported with background snapshot",
|
||||||
|
vbasedev->name);
|
||||||
|
return -EOPNOTSUPP;
|
||||||
|
}
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
static int vfio_save_setup(QEMUFile *f, void *opaque)
|
static int vfio_save_setup(QEMUFile *f, void *opaque)
|
||||||
{
|
{
|
||||||
VFIODevice *vbasedev = opaque;
|
VFIODevice *vbasedev = opaque;
|
||||||
|
@ -383,6 +417,19 @@ static void vfio_save_cleanup(void *opaque)
|
||||||
VFIODevice *vbasedev = opaque;
|
VFIODevice *vbasedev = opaque;
|
||||||
VFIOMigration *migration = vbasedev->migration;
|
VFIOMigration *migration = vbasedev->migration;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Changing device state from STOP_COPY to STOP can take time. Do it here,
|
||||||
|
* after migration has completed, so it won't increase downtime.
|
||||||
|
*/
|
||||||
|
if (migration->device_state == VFIO_DEVICE_STATE_STOP_COPY) {
|
||||||
|
/*
|
||||||
|
* If setting the device in STOP state fails, the device should be
|
||||||
|
* reset. To do so, use ERROR state as a recover state.
|
||||||
|
*/
|
||||||
|
vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
|
||||||
|
VFIO_DEVICE_STATE_ERROR);
|
||||||
|
}
|
||||||
|
|
||||||
g_free(migration->data_buffer);
|
g_free(migration->data_buffer);
|
||||||
migration->data_buffer = NULL;
|
migration->data_buffer = NULL;
|
||||||
migration->precopy_init_size = 0;
|
migration->precopy_init_size = 0;
|
||||||
|
@ -398,7 +445,7 @@ static void vfio_state_pending_estimate(void *opaque, uint64_t *must_precopy,
|
||||||
VFIODevice *vbasedev = opaque;
|
VFIODevice *vbasedev = opaque;
|
||||||
VFIOMigration *migration = vbasedev->migration;
|
VFIOMigration *migration = vbasedev->migration;
|
||||||
|
|
||||||
if (migration->device_state != VFIO_DEVICE_STATE_PRE_COPY) {
|
if (!vfio_device_state_is_precopy(vbasedev)) {
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -431,7 +478,7 @@ static void vfio_state_pending_exact(void *opaque, uint64_t *must_precopy,
|
||||||
vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
|
vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
|
||||||
*must_precopy += stop_copy_size;
|
*must_precopy += stop_copy_size;
|
||||||
|
|
||||||
if (migration->device_state == VFIO_DEVICE_STATE_PRE_COPY) {
|
if (vfio_device_state_is_precopy(vbasedev)) {
|
||||||
vfio_query_precopy_size(migration);
|
vfio_query_precopy_size(migration);
|
||||||
|
|
||||||
*must_precopy +=
|
*must_precopy +=
|
||||||
|
@ -446,9 +493,8 @@ static void vfio_state_pending_exact(void *opaque, uint64_t *must_precopy,
|
||||||
static bool vfio_is_active_iterate(void *opaque)
|
static bool vfio_is_active_iterate(void *opaque)
|
||||||
{
|
{
|
||||||
VFIODevice *vbasedev = opaque;
|
VFIODevice *vbasedev = opaque;
|
||||||
VFIOMigration *migration = vbasedev->migration;
|
|
||||||
|
|
||||||
return migration->device_state == VFIO_DEVICE_STATE_PRE_COPY;
|
return vfio_device_state_is_precopy(vbasedev);
|
||||||
}
|
}
|
||||||
|
|
||||||
static int vfio_save_iterate(QEMUFile *f, void *opaque)
|
static int vfio_save_iterate(QEMUFile *f, void *opaque)
|
||||||
|
@ -508,12 +554,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
|
||||||
* If setting the device in STOP state fails, the device should be reset.
|
|
||||||
* To do so, use ERROR state as a recover state.
|
|
||||||
*/
|
|
||||||
ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
|
|
||||||
VFIO_DEVICE_STATE_ERROR);
|
|
||||||
trace_vfio_save_complete_precopy(vbasedev->name, ret);
|
trace_vfio_save_complete_precopy(vbasedev->name, ret);
|
||||||
|
|
||||||
return ret;
|
return ret;
|
||||||
|
@ -630,6 +670,7 @@ static bool vfio_switchover_ack_needed(void *opaque)
|
||||||
}
|
}
|
||||||
|
|
||||||
static const SaveVMHandlers savevm_vfio_handlers = {
|
static const SaveVMHandlers savevm_vfio_handlers = {
|
||||||
|
.save_prepare = vfio_save_prepare,
|
||||||
.save_setup = vfio_save_setup,
|
.save_setup = vfio_save_setup,
|
||||||
.save_cleanup = vfio_save_cleanup,
|
.save_cleanup = vfio_save_cleanup,
|
||||||
.state_pending_estimate = vfio_state_pending_estimate,
|
.state_pending_estimate = vfio_state_pending_estimate,
|
||||||
|
@ -646,18 +687,50 @@ static const SaveVMHandlers savevm_vfio_handlers = {
|
||||||
|
|
||||||
/* ---------------------------------------------------------------------- */
|
/* ---------------------------------------------------------------------- */
|
||||||
|
|
||||||
static void vfio_vmstate_change(void *opaque, bool running, RunState state)
|
static void vfio_vmstate_change_prepare(void *opaque, bool running,
|
||||||
|
RunState state)
|
||||||
{
|
{
|
||||||
VFIODevice *vbasedev = opaque;
|
VFIODevice *vbasedev = opaque;
|
||||||
VFIOMigration *migration = vbasedev->migration;
|
VFIOMigration *migration = vbasedev->migration;
|
||||||
enum vfio_device_mig_state new_state;
|
enum vfio_device_mig_state new_state;
|
||||||
int ret;
|
int ret;
|
||||||
|
|
||||||
|
new_state = migration->device_state == VFIO_DEVICE_STATE_PRE_COPY ?
|
||||||
|
VFIO_DEVICE_STATE_PRE_COPY_P2P :
|
||||||
|
VFIO_DEVICE_STATE_RUNNING_P2P;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* If setting the device in new_state fails, the device should be reset.
|
||||||
|
* To do so, use ERROR state as a recover state.
|
||||||
|
*/
|
||||||
|
ret = vfio_migration_set_state(vbasedev, new_state,
|
||||||
|
VFIO_DEVICE_STATE_ERROR);
|
||||||
|
if (ret) {
|
||||||
|
/*
|
||||||
|
* Migration should be aborted in this case, but vm_state_notify()
|
||||||
|
* currently does not support reporting failures.
|
||||||
|
*/
|
||||||
|
if (migrate_get_current()->to_dst_file) {
|
||||||
|
qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
trace_vfio_vmstate_change_prepare(vbasedev->name, running,
|
||||||
|
RunState_str(state),
|
||||||
|
mig_state_to_str(new_state));
|
||||||
|
}
|
||||||
|
|
||||||
|
static void vfio_vmstate_change(void *opaque, bool running, RunState state)
|
||||||
|
{
|
||||||
|
VFIODevice *vbasedev = opaque;
|
||||||
|
enum vfio_device_mig_state new_state;
|
||||||
|
int ret;
|
||||||
|
|
||||||
if (running) {
|
if (running) {
|
||||||
new_state = VFIO_DEVICE_STATE_RUNNING;
|
new_state = VFIO_DEVICE_STATE_RUNNING;
|
||||||
} else {
|
} else {
|
||||||
new_state =
|
new_state =
|
||||||
(migration->device_state == VFIO_DEVICE_STATE_PRE_COPY &&
|
(vfio_device_state_is_precopy(vbasedev) &&
|
||||||
(state == RUN_STATE_FINISH_MIGRATE || state == RUN_STATE_PAUSED)) ?
|
(state == RUN_STATE_FINISH_MIGRATE || state == RUN_STATE_PAUSED)) ?
|
||||||
VFIO_DEVICE_STATE_STOP_COPY :
|
VFIO_DEVICE_STATE_STOP_COPY :
|
||||||
VFIO_DEVICE_STATE_STOP;
|
VFIO_DEVICE_STATE_STOP;
|
||||||
|
@ -753,6 +826,7 @@ static int vfio_migration_init(VFIODevice *vbasedev)
|
||||||
char id[256] = "";
|
char id[256] = "";
|
||||||
g_autofree char *path = NULL, *oid = NULL;
|
g_autofree char *path = NULL, *oid = NULL;
|
||||||
uint64_t mig_flags = 0;
|
uint64_t mig_flags = 0;
|
||||||
|
VMChangeStateHandler *prepare_cb;
|
||||||
|
|
||||||
if (!vbasedev->ops->vfio_get_object) {
|
if (!vbasedev->ops->vfio_get_object) {
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
@ -793,9 +867,11 @@ static int vfio_migration_init(VFIODevice *vbasedev)
|
||||||
register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers,
|
register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers,
|
||||||
vbasedev);
|
vbasedev);
|
||||||
|
|
||||||
migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
|
prepare_cb = migration->mig_flags & VFIO_MIGRATION_P2P ?
|
||||||
vfio_vmstate_change,
|
vfio_vmstate_change_prepare :
|
||||||
vbasedev);
|
NULL;
|
||||||
|
migration->vm_state = qdev_add_vm_change_state_handler_full(
|
||||||
|
vbasedev->dev, vfio_vmstate_change, prepare_cb, vbasedev);
|
||||||
migration->migration_state.notify = vfio_migration_state_notifier;
|
migration->migration_state.notify = vfio_migration_state_notifier;
|
||||||
add_migration_state_change_notifier(&migration->migration_state);
|
add_migration_state_change_notifier(&migration->migration_state);
|
||||||
|
|
||||||
|
|
|
@ -104,7 +104,7 @@ vfio_known_safe_misalignment(const char *name, uint64_t iova, uint64_t offset_wi
|
||||||
vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t size, uint64_t page_size) "Region \"%s\" 0x%"PRIx64" size=0x%"PRIx64" is not aligned to 0x%"PRIx64" and cannot be mapped for DMA"
|
vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t size, uint64_t page_size) "Region \"%s\" 0x%"PRIx64" size=0x%"PRIx64" is not aligned to 0x%"PRIx64" and cannot be mapped for DMA"
|
||||||
vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
|
vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
|
||||||
vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
|
vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
|
||||||
vfio_device_dirty_tracking_start(int nr_ranges, uint64_t min32, uint64_t max32, uint64_t min64, uint64_t max64) "nr_ranges %d 32:[0x%"PRIx64" - 0x%"PRIx64"], 64:[0x%"PRIx64" - 0x%"PRIx64"]"
|
vfio_device_dirty_tracking_start(int nr_ranges, uint64_t min32, uint64_t max32, uint64_t min64, uint64_t max64, uint64_t minpci, uint64_t maxpci) "nr_ranges %d 32:[0x%"PRIx64" - 0x%"PRIx64"], 64:[0x%"PRIx64" - 0x%"PRIx64"], pci64:[0x%"PRIx64" - 0x%"PRIx64"]"
|
||||||
vfio_disconnect_container(int fd) "close container->fd=%d"
|
vfio_disconnect_container(int fd) "close container->fd=%d"
|
||||||
vfio_put_group(int fd) "close group->fd=%d"
|
vfio_put_group(int fd) "close group->fd=%d"
|
||||||
vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
|
vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
|
||||||
|
@ -167,3 +167,4 @@ vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer
|
||||||
vfio_state_pending_estimate(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
|
vfio_state_pending_estimate(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
|
||||||
vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
|
vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
|
||||||
vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
|
vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
|
||||||
|
vfio_vmstate_change_prepare(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
|
||||||
|
|
|
@ -230,6 +230,8 @@ void vfio_unblock_multiple_devices_migration(void);
|
||||||
bool vfio_viommu_preset(VFIODevice *vbasedev);
|
bool vfio_viommu_preset(VFIODevice *vbasedev);
|
||||||
int64_t vfio_mig_bytes_transferred(void);
|
int64_t vfio_mig_bytes_transferred(void);
|
||||||
void vfio_reset_bytes_transferred(void);
|
void vfio_reset_bytes_transferred(void);
|
||||||
|
bool vfio_device_state_is_running(VFIODevice *vbasedev);
|
||||||
|
bool vfio_device_state_is_precopy(VFIODevice *vbasedev);
|
||||||
|
|
||||||
#ifdef CONFIG_LINUX
|
#ifdef CONFIG_LINUX
|
||||||
int vfio_get_region_info(VFIODevice *vbasedev, int index,
|
int vfio_get_region_info(VFIODevice *vbasedev, int index,
|
||||||
|
|
|
@ -20,6 +20,11 @@ typedef struct SaveVMHandlers {
|
||||||
/* This runs inside the iothread lock. */
|
/* This runs inside the iothread lock. */
|
||||||
SaveStateHandler *save_state;
|
SaveStateHandler *save_state;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* save_prepare is called early, even before migration starts, and can be
|
||||||
|
* used to perform early checks.
|
||||||
|
*/
|
||||||
|
int (*save_prepare)(void *opaque, Error **errp);
|
||||||
void (*save_cleanup)(void *opaque);
|
void (*save_cleanup)(void *opaque);
|
||||||
int (*save_live_complete_postcopy)(QEMUFile *f, void *opaque);
|
int (*save_live_complete_postcopy)(QEMUFile *f, void *opaque);
|
||||||
int (*save_live_complete_precopy)(QEMUFile *f, void *opaque);
|
int (*save_live_complete_precopy)(QEMUFile *f, void *opaque);
|
||||||
|
|
|
@ -16,9 +16,16 @@ VMChangeStateEntry *qemu_add_vm_change_state_handler(VMChangeStateHandler *cb,
|
||||||
void *opaque);
|
void *opaque);
|
||||||
VMChangeStateEntry *qemu_add_vm_change_state_handler_prio(
|
VMChangeStateEntry *qemu_add_vm_change_state_handler_prio(
|
||||||
VMChangeStateHandler *cb, void *opaque, int priority);
|
VMChangeStateHandler *cb, void *opaque, int priority);
|
||||||
|
VMChangeStateEntry *
|
||||||
|
qemu_add_vm_change_state_handler_prio_full(VMChangeStateHandler *cb,
|
||||||
|
VMChangeStateHandler *prepare_cb,
|
||||||
|
void *opaque, int priority);
|
||||||
VMChangeStateEntry *qdev_add_vm_change_state_handler(DeviceState *dev,
|
VMChangeStateEntry *qdev_add_vm_change_state_handler(DeviceState *dev,
|
||||||
VMChangeStateHandler *cb,
|
VMChangeStateHandler *cb,
|
||||||
void *opaque);
|
void *opaque);
|
||||||
|
VMChangeStateEntry *qdev_add_vm_change_state_handler_full(
|
||||||
|
DeviceState *dev, VMChangeStateHandler *cb,
|
||||||
|
VMChangeStateHandler *prepare_cb, void *opaque);
|
||||||
void qemu_del_vm_change_state_handler(VMChangeStateEntry *e);
|
void qemu_del_vm_change_state_handler(VMChangeStateEntry *e);
|
||||||
/**
|
/**
|
||||||
* vm_state_notify: Notify the state of the VM
|
* vm_state_notify: Notify the state of the VM
|
||||||
|
|
|
@ -1039,7 +1039,7 @@ static void fill_source_migration_info(MigrationInfo *info)
|
||||||
populate_time_info(info, s);
|
populate_time_info(info, s);
|
||||||
populate_ram_info(info, s);
|
populate_ram_info(info, s);
|
||||||
populate_disk_info(info);
|
populate_disk_info(info);
|
||||||
populate_vfio_info(info);
|
migration_populate_vfio_info(info);
|
||||||
break;
|
break;
|
||||||
case MIGRATION_STATUS_COLO:
|
case MIGRATION_STATUS_COLO:
|
||||||
info->has_status = true;
|
info->has_status = true;
|
||||||
|
@ -1048,7 +1048,7 @@ static void fill_source_migration_info(MigrationInfo *info)
|
||||||
case MIGRATION_STATUS_COMPLETED:
|
case MIGRATION_STATUS_COMPLETED:
|
||||||
populate_time_info(info, s);
|
populate_time_info(info, s);
|
||||||
populate_ram_info(info, s);
|
populate_ram_info(info, s);
|
||||||
populate_vfio_info(info);
|
migration_populate_vfio_info(info);
|
||||||
break;
|
break;
|
||||||
case MIGRATION_STATUS_FAILED:
|
case MIGRATION_STATUS_FAILED:
|
||||||
info->has_status = true;
|
info->has_status = true;
|
||||||
|
@ -1392,8 +1392,15 @@ bool migration_is_active(MigrationState *s)
|
||||||
s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
|
s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
|
||||||
}
|
}
|
||||||
|
|
||||||
void migrate_init(MigrationState *s)
|
int migrate_init(MigrationState *s, Error **errp)
|
||||||
{
|
{
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
ret = qemu_savevm_state_prepare(errp);
|
||||||
|
if (ret) {
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Reinitialise all migration state, except
|
* Reinitialise all migration state, except
|
||||||
* parameters/capabilities that the user set, and
|
* parameters/capabilities that the user set, and
|
||||||
|
@ -1425,6 +1432,15 @@ void migrate_init(MigrationState *s)
|
||||||
s->iteration_initial_bytes = 0;
|
s->iteration_initial_bytes = 0;
|
||||||
s->threshold_size = 0;
|
s->threshold_size = 0;
|
||||||
s->switchover_acked = false;
|
s->switchover_acked = false;
|
||||||
|
/*
|
||||||
|
* set mig_stats compression_counters memory to zero for a
|
||||||
|
* new migration
|
||||||
|
*/
|
||||||
|
memset(&mig_stats, 0, sizeof(mig_stats));
|
||||||
|
memset(&compression_counters, 0, sizeof(compression_counters));
|
||||||
|
migration_reset_vfio_bytes_transferred();
|
||||||
|
|
||||||
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
int migrate_add_blocker_internal(Error *reason, Error **errp)
|
int migrate_add_blocker_internal(Error *reason, Error **errp)
|
||||||
|
@ -1634,14 +1650,9 @@ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
|
||||||
migrate_set_block_incremental(true);
|
migrate_set_block_incremental(true);
|
||||||
}
|
}
|
||||||
|
|
||||||
migrate_init(s);
|
if (migrate_init(s, errp)) {
|
||||||
/*
|
return false;
|
||||||
* set mig_stats compression_counters memory to zero for a
|
}
|
||||||
* new migration
|
|
||||||
*/
|
|
||||||
memset(&mig_stats, 0, sizeof(mig_stats));
|
|
||||||
memset(&compression_counters, 0, sizeof(compression_counters));
|
|
||||||
reset_vfio_bytes_transferred();
|
|
||||||
|
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
|
|
@ -472,7 +472,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in);
|
||||||
bool migration_is_setup_or_active(int state);
|
bool migration_is_setup_or_active(int state);
|
||||||
bool migration_is_running(int state);
|
bool migration_is_running(int state);
|
||||||
|
|
||||||
void migrate_init(MigrationState *s);
|
int migrate_init(MigrationState *s, Error **errp);
|
||||||
bool migration_is_blocked(Error **errp);
|
bool migration_is_blocked(Error **errp);
|
||||||
/* True if outgoing migration has entered postcopy phase */
|
/* True if outgoing migration has entered postcopy phase */
|
||||||
bool migration_in_postcopy(void);
|
bool migration_in_postcopy(void);
|
||||||
|
@ -512,8 +512,8 @@ void migration_consume_urgent_request(void);
|
||||||
bool migration_rate_limit(void);
|
bool migration_rate_limit(void);
|
||||||
void migration_cancel(const Error *error);
|
void migration_cancel(const Error *error);
|
||||||
|
|
||||||
void populate_vfio_info(MigrationInfo *info);
|
void migration_populate_vfio_info(MigrationInfo *info);
|
||||||
void reset_vfio_bytes_transferred(void);
|
void migration_reset_vfio_bytes_transferred(void);
|
||||||
void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
|
void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
|
||||||
|
|
||||||
#endif
|
#endif
|
||||||
|
|
|
@ -1233,6 +1233,30 @@ bool qemu_savevm_state_guest_unplug_pending(void)
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
int qemu_savevm_state_prepare(Error **errp)
|
||||||
|
{
|
||||||
|
SaveStateEntry *se;
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
|
||||||
|
if (!se->ops || !se->ops->save_prepare) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
if (se->ops->is_active) {
|
||||||
|
if (!se->ops->is_active(se->opaque)) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
ret = se->ops->save_prepare(se->opaque, errp);
|
||||||
|
if (ret < 0) {
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
void qemu_savevm_state_setup(QEMUFile *f)
|
void qemu_savevm_state_setup(QEMUFile *f)
|
||||||
{
|
{
|
||||||
MigrationState *ms = migrate_get_current();
|
MigrationState *ms = migrate_get_current();
|
||||||
|
@ -1619,10 +1643,10 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
}
|
}
|
||||||
|
|
||||||
migrate_init(ms);
|
ret = migrate_init(ms, errp);
|
||||||
memset(&mig_stats, 0, sizeof(mig_stats));
|
if (ret) {
|
||||||
memset(&compression_counters, 0, sizeof(compression_counters));
|
return ret;
|
||||||
reset_vfio_bytes_transferred();
|
}
|
||||||
ms->to_dst_file = f;
|
ms->to_dst_file = f;
|
||||||
|
|
||||||
qemu_mutex_unlock_iothread();
|
qemu_mutex_unlock_iothread();
|
||||||
|
|
|
@ -31,6 +31,7 @@
|
||||||
|
|
||||||
bool qemu_savevm_state_blocked(Error **errp);
|
bool qemu_savevm_state_blocked(Error **errp);
|
||||||
void qemu_savevm_non_migratable_list(strList **reasons);
|
void qemu_savevm_non_migratable_list(strList **reasons);
|
||||||
|
int qemu_savevm_state_prepare(Error **errp);
|
||||||
void qemu_savevm_state_setup(QEMUFile *f);
|
void qemu_savevm_state_setup(QEMUFile *f);
|
||||||
bool qemu_savevm_state_guest_unplug_pending(void);
|
bool qemu_savevm_state_guest_unplug_pending(void);
|
||||||
int qemu_savevm_state_resume_prepare(MigrationState *s);
|
int qemu_savevm_state_resume_prepare(MigrationState *s);
|
||||||
|
|
|
@ -15,7 +15,7 @@
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#ifdef CONFIG_VFIO
|
#ifdef CONFIG_VFIO
|
||||||
void populate_vfio_info(MigrationInfo *info)
|
void migration_populate_vfio_info(MigrationInfo *info)
|
||||||
{
|
{
|
||||||
if (vfio_mig_active()) {
|
if (vfio_mig_active()) {
|
||||||
info->vfio = g_malloc0(sizeof(*info->vfio));
|
info->vfio = g_malloc0(sizeof(*info->vfio));
|
||||||
|
@ -23,16 +23,16 @@ void populate_vfio_info(MigrationInfo *info)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
void reset_vfio_bytes_transferred(void)
|
void migration_reset_vfio_bytes_transferred(void)
|
||||||
{
|
{
|
||||||
vfio_reset_bytes_transferred();
|
vfio_reset_bytes_transferred();
|
||||||
}
|
}
|
||||||
#else
|
#else
|
||||||
void populate_vfio_info(MigrationInfo *info)
|
void migration_populate_vfio_info(MigrationInfo *info)
|
||||||
{
|
{
|
||||||
}
|
}
|
||||||
|
|
||||||
void reset_vfio_bytes_transferred(void)
|
void migration_reset_vfio_bytes_transferred(void)
|
||||||
{
|
{
|
||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
|
|
|
@ -271,6 +271,7 @@ void qemu_system_vmstop_request(RunState state)
|
||||||
}
|
}
|
||||||
struct VMChangeStateEntry {
|
struct VMChangeStateEntry {
|
||||||
VMChangeStateHandler *cb;
|
VMChangeStateHandler *cb;
|
||||||
|
VMChangeStateHandler *prepare_cb;
|
||||||
void *opaque;
|
void *opaque;
|
||||||
QTAILQ_ENTRY(VMChangeStateEntry) entries;
|
QTAILQ_ENTRY(VMChangeStateEntry) entries;
|
||||||
int priority;
|
int priority;
|
||||||
|
@ -293,12 +294,39 @@ static QTAILQ_HEAD(, VMChangeStateEntry) vm_change_state_head =
|
||||||
*/
|
*/
|
||||||
VMChangeStateEntry *qemu_add_vm_change_state_handler_prio(
|
VMChangeStateEntry *qemu_add_vm_change_state_handler_prio(
|
||||||
VMChangeStateHandler *cb, void *opaque, int priority)
|
VMChangeStateHandler *cb, void *opaque, int priority)
|
||||||
|
{
|
||||||
|
return qemu_add_vm_change_state_handler_prio_full(cb, NULL, opaque,
|
||||||
|
priority);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* qemu_add_vm_change_state_handler_prio_full:
|
||||||
|
* @cb: the main callback to invoke
|
||||||
|
* @prepare_cb: a callback to invoke before the main callback
|
||||||
|
* @opaque: user data passed to the callbacks
|
||||||
|
* @priority: low priorities execute first when the vm runs and the reverse is
|
||||||
|
* true when the vm stops
|
||||||
|
*
|
||||||
|
* Register a main callback function and an optional prepare callback function
|
||||||
|
* that are invoked when the vm starts or stops running. The main callback and
|
||||||
|
* the prepare callback are called in two separate phases: First all prepare
|
||||||
|
* callbacks are called and only then all main callbacks are called. As its
|
||||||
|
* name suggests, the prepare callback can be used to do some preparatory work
|
||||||
|
* before invoking the main callback.
|
||||||
|
*
|
||||||
|
* Returns: an entry to be freed using qemu_del_vm_change_state_handler()
|
||||||
|
*/
|
||||||
|
VMChangeStateEntry *
|
||||||
|
qemu_add_vm_change_state_handler_prio_full(VMChangeStateHandler *cb,
|
||||||
|
VMChangeStateHandler *prepare_cb,
|
||||||
|
void *opaque, int priority)
|
||||||
{
|
{
|
||||||
VMChangeStateEntry *e;
|
VMChangeStateEntry *e;
|
||||||
VMChangeStateEntry *other;
|
VMChangeStateEntry *other;
|
||||||
|
|
||||||
e = g_malloc0(sizeof(*e));
|
e = g_malloc0(sizeof(*e));
|
||||||
e->cb = cb;
|
e->cb = cb;
|
||||||
|
e->prepare_cb = prepare_cb;
|
||||||
e->opaque = opaque;
|
e->opaque = opaque;
|
||||||
e->priority = priority;
|
e->priority = priority;
|
||||||
|
|
||||||
|
@ -333,10 +361,22 @@ void vm_state_notify(bool running, RunState state)
|
||||||
trace_vm_state_notify(running, state, RunState_str(state));
|
trace_vm_state_notify(running, state, RunState_str(state));
|
||||||
|
|
||||||
if (running) {
|
if (running) {
|
||||||
|
QTAILQ_FOREACH_SAFE(e, &vm_change_state_head, entries, next) {
|
||||||
|
if (e->prepare_cb) {
|
||||||
|
e->prepare_cb(e->opaque, running, state);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
QTAILQ_FOREACH_SAFE(e, &vm_change_state_head, entries, next) {
|
QTAILQ_FOREACH_SAFE(e, &vm_change_state_head, entries, next) {
|
||||||
e->cb(e->opaque, running, state);
|
e->cb(e->opaque, running, state);
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
|
QTAILQ_FOREACH_REVERSE_SAFE(e, &vm_change_state_head, entries, next) {
|
||||||
|
if (e->prepare_cb) {
|
||||||
|
e->prepare_cb(e->opaque, running, state);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
QTAILQ_FOREACH_REVERSE_SAFE(e, &vm_change_state_head, entries, next) {
|
QTAILQ_FOREACH_REVERSE_SAFE(e, &vm_change_state_head, entries, next) {
|
||||||
e->cb(e->opaque, running, state);
|
e->cb(e->opaque, running, state);
|
||||||
}
|
}
|
||||||
|
|
Loading…
Reference in New Issue