mirror of https://github.com/xemu-project/xemu.git
![]() NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows partitioning of the GPU device resources (including device memory) into several (upto 8) isolated instances. Each of the partitioned memory needs a dedicated NUMA node to operate. The partitions are not fixed and they can be created/deleted at runtime. Unfortunately Linux OS does not provide a means to dynamically create/destroy NUMA nodes and such feature implementation is not expected to be trivial. The nodes that OS discovers at the boot time while parsing SRAT remains fixed. So we utilize the Generic Initiator (GI) Affinity structures that allows association between nodes and devices. Multiple GI structures per BDF is possible, allowing creation of multiple nodes by exposing unique PXM in each of these structures. Implement the mechanism to build the GI affinity structures as Qemu currently does not. Introduce a new acpi-generic-initiator object to allow host admin link a device with an associated NUMA node. Qemu maintains this association and use this object to build the requisite GI Affinity Structure. When multiple NUMA nodes are associated with a device, it is required to create those many number of acpi-generic-initiator objects, each representing a unique device:node association. Following is one of a decoded GI affinity structure in VM ACPI SRAT. [0C8h 0200 1] Subtable Type : 05 [Generic Initiator Affinity] [0C9h 0201 1] Length : 20 [0CAh 0202 1] Reserved1 : 00 [0CBh 0203 1] Device Handle Type : 01 [0CCh 0204 4] Proximity Domain : 00000007 [0D0h 0208 16] Device Handle : 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 [0E0h 0224 4] Flags (decoded below) : 00000001 Enabled : 1 [0E4h 0228 4] Reserved2 : 00000000 [0E8h 0232 1] Subtable Type : 05 [Generic Initiator Affinity] [0E9h 0233 1] Length : 20 An admin can provide a range of acpi-generic-initiator objects, each associating a device (by providing the id through pci-dev argument) to the desired NUMA node (using the node argument). Currently, only PCI device is supported. For the grace hopper system, create a range of 8 nodes and associate that with the device using the acpi-generic-initiator object. While a configuration of less than 8 nodes per device is allowed, such configuration will prevent utilization of the feature to the fullest. The following sample creates 8 nodes per PCI device for a VM with 2 PCI devices and link them to the respecitve PCI device using acpi-generic-initiator objects: -numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 \ -numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 \ -numa node,nodeid=8 -numa node,nodeid=9 \ -device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \ -object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \ -object acpi-generic-initiator,id=gi1,pci-dev=dev0,node=3 \ -object acpi-generic-initiator,id=gi2,pci-dev=dev0,node=4 \ -object acpi-generic-initiator,id=gi3,pci-dev=dev0,node=5 \ -object acpi-generic-initiator,id=gi4,pci-dev=dev0,node=6 \ -object acpi-generic-initiator,id=gi5,pci-dev=dev0,node=7 \ -object acpi-generic-initiator,id=gi6,pci-dev=dev0,node=8 \ -object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=9 \ -numa node,nodeid=10 -numa node,nodeid=11 -numa node,nodeid=12 \ -numa node,nodeid=13 -numa node,nodeid=14 -numa node,nodeid=15 \ -numa node,nodeid=16 -numa node,nodeid=17 \ -device vfio-pci-nohotplug,host=0009:01:01.0,bus=pcie.0,addr=05.0,rombar=0,id=dev1 \ -object acpi-generic-initiator,id=gi8,pci-dev=dev1,node=10 \ -object acpi-generic-initiator,id=gi9,pci-dev=dev1,node=11 \ -object acpi-generic-initiator,id=gi10,pci-dev=dev1,node=12 \ -object acpi-generic-initiator,id=gi11,pci-dev=dev1,node=13 \ -object acpi-generic-initiator,id=gi12,pci-dev=dev1,node=14 \ -object acpi-generic-initiator,id=gi13,pci-dev=dev1,node=15 \ -object acpi-generic-initiator,id=gi14,pci-dev=dev1,node=16 \ -object acpi-generic-initiator,id=gi15,pci-dev=dev1,node=17 \ Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu [1] Cc: Jonathan Cameron <qemu-devel@nongnu.org> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Message-Id: <20240308145525.10886-2-ankita@nvidia.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> |
||
---|---|---|
.github/workflows | ||
.gitlab/issue_templates | ||
.gitlab-ci.d | ||
accel | ||
audio | ||
authz | ||
backends | ||
block | ||
bsd-user | ||
chardev | ||
common-user | ||
configs | ||
contrib | ||
crypto | ||
disas | ||
docs | ||
dump | ||
ebpf | ||
fpu | ||
fsdev | ||
gdb-xml | ||
gdbstub | ||
host/include | ||
hw | ||
include | ||
io | ||
libdecnumber | ||
linux-headers | ||
linux-user | ||
migration | ||
monitor | ||
nbd | ||
net | ||
pc-bios | ||
plugins | ||
po | ||
python | ||
qapi | ||
qga | ||
qobject | ||
qom | ||
replay | ||
roms | ||
scripts | ||
scsi | ||
semihosting | ||
stats | ||
storage-daemon | ||
stubs | ||
subprojects | ||
system | ||
target | ||
tcg | ||
tests | ||
tools | ||
trace | ||
ui | ||
util | ||
.dir-locals.el | ||
.editorconfig | ||
.exrc | ||
.gdbinit | ||
.git-blame-ignore-revs | ||
.gitattributes | ||
.gitignore | ||
.gitlab-ci.yml | ||
.gitmodules | ||
.gitpublish | ||
.mailmap | ||
.patchew.yml | ||
.readthedocs.yml | ||
.travis.yml | ||
COPYING | ||
COPYING.LIB | ||
Kconfig | ||
Kconfig.host | ||
LICENSE | ||
MAINTAINERS | ||
Makefile | ||
README.rst | ||
VERSION | ||
block.c | ||
blockdev-nbd.c | ||
blockdev.c | ||
blockjob.c | ||
configure | ||
cpu-common.c | ||
cpu-target.c | ||
event-loop-base.c | ||
gitdm.config | ||
hmp-commands-info.hx | ||
hmp-commands.hx | ||
iothread.c | ||
job-qmp.c | ||
job.c | ||
meson.build | ||
meson_options.txt | ||
module-common.c | ||
os-posix.c | ||
os-win32.c | ||
page-vary-common.c | ||
page-vary-target.c | ||
pythondeps.toml | ||
qemu-bridge-helper.c | ||
qemu-edid.c | ||
qemu-img-cmds.hx | ||
qemu-img.c | ||
qemu-io-cmds.c | ||
qemu-io.c | ||
qemu-keymap.c | ||
qemu-nbd.c | ||
qemu-options.hx | ||
qemu.nsi | ||
qemu.sasl | ||
replication.c | ||
trace-events | ||
version.rc |
README.rst
=========== QEMU README =========== QEMU is a generic and open source machine & userspace emulator and virtualizer. QEMU is capable of emulating a complete machine in software without any need for hardware virtualization support. By using dynamic translation, it achieves very good performance. QEMU can also integrate with the Xen and KVM hypervisors to provide emulated hardware while allowing the hypervisor to manage the CPU. With hypervisor support, QEMU can achieve near native performance for CPUs. When QEMU emulates CPUs directly it is capable of running operating systems made for one machine (e.g. an ARMv7 board) on a different machine (e.g. an x86_64 PC board). QEMU is also capable of providing userspace API virtualization for Linux and BSD kernel interfaces. This allows binaries compiled against one architecture ABI (e.g. the Linux PPC64 ABI) to be run on a host using a different architecture ABI (e.g. the Linux x86_64 ABI). This does not involve any hardware emulation, simply CPU and syscall emulation. QEMU aims to fit into a variety of use cases. It can be invoked directly by users wishing to have full control over its behaviour and settings. It also aims to facilitate integration into higher level management layers, by providing a stable command line interface and monitor API. It is commonly invoked indirectly via the libvirt library when using open source applications such as oVirt, OpenStack and virt-manager. QEMU as a whole is released under the GNU General Public License, version 2. For full licensing details, consult the LICENSE file. Documentation ============= Documentation can be found hosted online at `<https://www.qemu.org/documentation/>`_. The documentation for the current development version that is available at `<https://www.qemu.org/docs/master/>`_ is generated from the ``docs/`` folder in the source tree, and is built by `Sphinx <https://www.sphinx-doc.org/en/master/>`_. Building ======== QEMU is multi-platform software intended to be buildable on all modern Linux platforms, OS-X, Win32 (via the Mingw64 toolchain) and a variety of other UNIX targets. The simple steps to build QEMU are: .. code-block:: shell mkdir build cd build ../configure make Additional information can also be found online via the QEMU website: * `<https://wiki.qemu.org/Hosts/Linux>`_ * `<https://wiki.qemu.org/Hosts/Mac>`_ * `<https://wiki.qemu.org/Hosts/W32>`_ Submitting patches ================== The QEMU source code is maintained under the GIT version control system. .. code-block:: shell git clone https://gitlab.com/qemu-project/qemu.git When submitting patches, one common approach is to use 'git format-patch' and/or 'git send-email' to format & send the mail to the qemu-devel@nongnu.org mailing list. All patches submitted must contain a 'Signed-off-by' line from the author. Patches should follow the guidelines set out in the `style section <https://www.qemu.org/docs/master/devel/style.html>`_ of the Developers Guide. Additional information on submitting patches can be found online via the QEMU website * `<https://wiki.qemu.org/Contribute/SubmitAPatch>`_ * `<https://wiki.qemu.org/Contribute/TrivialPatches>`_ The QEMU website is also maintained under source control. .. code-block:: shell git clone https://gitlab.com/qemu-project/qemu-web.git * `<https://www.qemu.org/2017/02/04/the-new-qemu-website-is-up/>`_ A 'git-publish' utility was created to make above process less cumbersome, and is highly recommended for making regular contributions, or even just for sending consecutive patch series revisions. It also requires a working 'git send-email' setup, and by default doesn't automate everything, so you may want to go through the above steps manually for once. For installation instructions, please go to * `<https://github.com/stefanha/git-publish>`_ The workflow with 'git-publish' is: .. code-block:: shell $ git checkout master -b my-feature $ # work on new commits, add your 'Signed-off-by' lines to each $ git publish Your patch series will be sent and tagged as my-feature-v1 if you need to refer back to it in the future. Sending v2: .. code-block:: shell $ git checkout my-feature # same topic branch $ # making changes to the commits (using 'git rebase', for example) $ git publish Your patch series will be sent with 'v2' tag in the subject and the git tip will be tagged as my-feature-v2. Bug reporting ============= The QEMU project uses GitLab issues to track bugs. Bugs found when running code built from QEMU git or upstream released sources should be reported via: * `<https://gitlab.com/qemu-project/qemu/-/issues>`_ If using QEMU via an operating system vendor pre-built binary package, it is preferable to report bugs to the vendor's own bug tracker first. If the bug is also known to affect latest upstream code, it can also be reported via GitLab. For additional information on bug reporting consult: * `<https://wiki.qemu.org/Contribute/ReportABug>`_ ChangeLog ========= For version history and release notes, please visit `<https://wiki.qemu.org/ChangeLog/>`_ or look at the git history for more detailed information. Contact ======= The QEMU community can be contacted in a number of ways, with the two main methods being email and IRC * `<mailto:qemu-devel@nongnu.org>`_ * `<https://lists.nongnu.org/mailman/listinfo/qemu-devel>`_ * #qemu on irc.oftc.net Information on additional methods of contacting the community can be found online via the QEMU website: * `<https://wiki.qemu.org/Contribute/StartHere>`_