xemu/hw
David Hildenbrand 910b25766b virtio-mem: Paravirtualized memory hot(un)plug
This is the very basic/initial version of virtio-mem. An introduction to
virtio-mem can be found in the Linux kernel driver [1]. While it can be
used in the current state for hotplug of a smaller amount of memory, it
will heavily benefit from resizeable memory regions in the future.

Each virtio-mem device manages a memory region (provided via a memory
backend). After requested by the hypervisor ("requested-size"), the
guest can try to plug/unplug blocks of memory within that region, in order
to reach the requested size. Initially, and after a reboot, all memory is
unplugged (except in special cases - reboot during postcopy).

The guest may only try to plug/unplug blocks of memory within the usable
region size. The usable region size is a little bigger than the
requested size, to give the device driver some flexibility. The usable
region size will only grow, except on reboots or when all memory is
requested to get unplugged. The guest can never plug more memory than
requested. Unplugged memory will get zapped/discarded, similar to in a
balloon device.

The block size is variable, however, it is always chosen in a way such that
THP splits are avoided (e.g., 2MB). The state of each block
(plugged/unplugged) is tracked in a bitmap.

As virtio-mem devices (e.g., virtio-mem-pci) will be memory devices, we now
expose "VirtioMEMDeviceInfo" via "query-memory-devices".

--------------------------------------------------------------------------

There are two important follow-up items that are in the works:
1. Resizeable memory regions: Use resizeable allocations/RAM blocks to
   grow/shrink along with the usable region size. This avoids creating
   initially very big VMAs, RAM blocks, and KVM slots.
2. Protection of unplugged memory: Make sure the gust cannot actually
   make use of unplugged memory.

Other follow-up items that are in the works:
1. Exclude unplugged memory during migration (via precopy notifier).
2. Handle remapping of memory.
3. Support for other architectures.

--------------------------------------------------------------------------

Example usage (virtio-mem-pci is introduced in follow-up patches):

Start QEMU with two virtio-mem devices (one per NUMA node):
 $ qemu-system-x86_64 -m 4G,maxmem=20G \
  -smp sockets=2,cores=2 \
  -numa node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 \
  [...]
  -object memory-backend-ram,id=mem0,size=8G \
  -device virtio-mem-pci,id=vm0,memdev=mem0,node=0,requested-size=0M \
  -object memory-backend-ram,id=mem1,size=8G \
  -device virtio-mem-pci,id=vm1,memdev=mem1,node=1,requested-size=1G

Query the configuration:
 (qemu) info memory-devices
 Memory device [virtio-mem]: "vm0"
   memaddr: 0x140000000
   node: 0
   requested-size: 0
   size: 0
   max-size: 8589934592
   block-size: 2097152
   memdev: /objects/mem0
 Memory device [virtio-mem]: "vm1"
   memaddr: 0x340000000
   node: 1
   requested-size: 1073741824
   size: 1073741824
   max-size: 8589934592
   block-size: 2097152
   memdev: /objects/mem1

Add some memory to node 0:
 (qemu) qom-set vm0 requested-size 500M

Remove some memory from node 1:
 (qemu) qom-set vm1 requested-size 200M

Query the configuration again:
 (qemu) info memory-devices
 Memory device [virtio-mem]: "vm0"
   memaddr: 0x140000000
   node: 0
   requested-size: 524288000
   size: 524288000
   max-size: 8589934592
   block-size: 2097152
   memdev: /objects/mem0
 Memory device [virtio-mem]: "vm1"
   memaddr: 0x340000000
   node: 1
   requested-size: 209715200
   size: 209715200
   max-size: 8589934592
   block-size: 2097152
   memdev: /objects/mem1

[1] https://lkml.kernel.org/r/20200311171422.10484-1-david@redhat.com

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20200626072248.78761-11-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-07-03 07:57:04 -04:00
..
9pfs xen/9pfs: increase max ring order to 9 2020-05-25 11:45:40 +02:00
acpi Rename use_acpi_pci_hotplug to more appropriate use_acpi_hotplug_bridge 2020-06-24 19:03:57 -04:00
adc hw/adc/stm32f2xx_adc: Correct memory region size and access size 2020-06-05 17:23:09 +01:00
alpha sysbus: Convert to sysbus_realize() etc. with Coccinelle 2020-06-15 22:05:28 +02:00
arm target-arm queue: 2020-06-26 18:22:36 +01:00
audio hw/audio/gus: Fix registers 32-bit access 2020-06-19 11:20:09 +02:00
block virtio,acpi,pci: fixes, cleanups. 2020-06-25 16:52:42 +01:00
char ibex_uart: fix XOR-as-pow 2020-06-26 09:39:40 -04:00
core vmport: move compat properties to hw_compat_5_0 2020-06-26 09:39:40 -04:00
cpu sysbus: Convert qdev_set_parent_bus() use with Coccinelle, part 3 2020-06-15 22:06:04 +02:00
cris sysbus: Convert to sysbus_realize() etc. with Coccinelle 2020-06-15 22:05:28 +02:00
display sysbus: Convert qdev_set_parent_bus() use with Coccinelle, part 3 2020-06-15 22:06:04 +02:00
dma sysbus: Convert to sysbus_realize() etc. with Coccinelle 2020-06-15 22:05:28 +02:00
gpio hw/unicore32/puv3: Use qemu_log_mask(ERROR) instead of debug printf() 2020-06-09 19:01:56 +02:00
hppa sysbus: Convert to sysbus_realize() etc. with Coccinelle 2020-06-15 22:05:28 +02:00
hyperv hyperv: vmbus: Remove the 2nd IRQ 2020-06-26 09:39:40 -04:00
i2c hw/i2c/core: Add i2c_try_create_slave() and i2c_realize_and_unref() 2020-06-26 14:30:28 +01:00
i386 pc: Support coldplugging of virtio-pmem-pci devices on all buses 2020-07-02 05:54:59 -04:00
ide qdev: Make qdev_prop_set_drive() match the other helpers 2020-06-23 16:07:07 +02:00
input adb: add ADB bus trace events 2020-06-26 10:13:52 +01:00
intc hw/intc: Add Loongson LIOINTC support 2020-06-27 19:42:22 +02:00
ipack qdev: Unrealize must not fail 2020-05-15 07:08:14 +02:00
ipmi various: Remove unnecessary OBJECT() cast 2020-05-15 07:08:14 +02:00
isa fdc: Reject clash between -drive if=floppy and -global isa-fdc 2020-06-23 16:07:07 +02:00
lm32 sysbus: Convert to sysbus_realize() etc. with Coccinelle 2020-06-15 22:05:28 +02:00
m68k qdev: Make qdev_prop_set_drive() match the other helpers 2020-06-23 16:07:07 +02:00
mem nvdimm: Plug memory leak in uuid property setter 2020-05-27 07:44:59 +02:00
microblaze qdev: Make qdev_prop_set_drive() match the other helpers 2020-06-23 16:07:07 +02:00
mips sysbus: Convert qdev_set_parent_bus() use with Coccinelle, part 1 2020-06-15 22:06:04 +02:00
misc hw/misc/pca9552: Add missing TypeInfo::class_size field 2020-06-29 21:16:10 +01:00
moxie hw: Make MachineClass::is_default a boolean type 2020-02-28 14:57:19 -05:00
net hw/net/e1000e: Do not abort() on invalid PSRCTL register value 2020-06-18 21:05:52 +08:00
nios2 sysbus: Convert to sysbus_realize() etc. with Coccinelle 2020-06-15 22:05:28 +02:00
nubus hw: Remove unnecessary DEVICE() cast 2020-05-15 07:08:52 +02:00
nvram sysbus: Convert to sysbus_realize() etc. with Coccinelle 2020-06-15 22:05:28 +02:00
openrisc sysbus: Convert to sysbus_realize() etc. with Coccinelle 2020-06-15 22:05:28 +02:00
pci pci: pci_create(), pci_create_multifunction() are now unused, drop 2020-06-15 22:05:28 +02:00
pci-bridge sysbus: Convert to sysbus_realize() etc. with Coccinelle 2020-06-15 22:05:28 +02:00
pci-host qdev: Convert bus-less devices to qdev_realize() with Coccinelle 2020-06-15 22:06:04 +02:00
pcmcia sysbus: Convert to sysbus_realize() etc. with Coccinelle 2020-06-15 22:05:28 +02:00
ppc * Various fixes 2020-06-26 16:55:20 +01:00
rdma lockable: Replace locks with lock guard macros 2020-05-04 16:07:43 +01:00
riscv hw/riscv: sifive_u: Add a dummy DDR memory controller device 2020-06-19 08:25:27 -07:00
rtc sysbus: Convert to sysbus_realize() etc. with Coccinelle 2020-06-15 22:05:28 +02:00
rx hw/rx: Add RX GDB simulator 2020-06-22 18:37:12 +02:00
s390x s390x/pv: Convert to ram_block_discard_disable() 2020-07-02 05:54:59 -04:00
scsi hw/scsi/megasas: Fix possible out-of-bounds array access in tracepoints 2020-06-26 09:39:37 -04:00
sd sd/milkymist-memcard: Fix error API violation 2020-06-23 16:07:21 +02:00
semihosting semihosting: remove the pthread include which seems unused 2020-06-10 11:29:44 +02:00
sh4 hw/sh4: Extract timer definitions to 'hw/timer/tmu012.h' 2020-06-22 18:37:12 +02:00
smbios hw/smbios/smbios: Remove unused include 2020-02-06 10:38:57 +01:00
sparc sysbus: Convert to sysbus_realize() etc. with Coccinelle 2020-06-15 22:05:28 +02:00
sparc64 fdc: Reject clash between -drive if=floppy and -global isa-fdc 2020-06-23 16:07:07 +02:00
ssi ssi: ssi_create_slave_no_init() is now unused, drop 2020-06-15 22:05:28 +02:00
timer hw/timer: RX62N compare match timer (CMT) 2020-06-22 18:37:12 +02:00
tpm tpm: Move backend code under the 'backends/' directory 2020-06-19 07:25:55 -04:00
tricore hw: Do not initialize MachineClass::is_default to 0 2020-02-28 14:57:19 -05:00
unicore32 hw/unicore32/puv3: Use qemu_log_mask(ERROR) instead of debug printf() 2020-06-09 19:01:56 +02:00
usb osdep: Make MIN/MAX evaluate arguments only once 2020-06-26 09:39:39 -04:00
vfio vfio: Convert to ram_block_discard_disable() 2020-07-02 05:54:59 -04:00
virtio virtio-mem: Paravirtualized memory hot(un)plug 2020-07-03 07:57:04 -04:00
watchdog hw/watchdog/cmsdk-apb-watchdog: Add trace event for lock status 2020-06-23 11:39:47 +01:00
xen xen: Actually fix build without passthrough 2020-06-26 09:39:37 -04:00
xenpv trivial: Remove xenfb_enabled from sysemu.h 2020-02-04 09:00:57 +01:00
xtensa qdev: Make qdev_prop_set_drive() match the other helpers 2020-06-23 16:07:07 +02:00
Kconfig hw/rx: RX62N microcontroller (MCU) 2020-06-22 18:37:12 +02:00
Makefile.objs xen: fix build without pci passthrough 2020-06-12 11:20:12 -04:00