Commit Graph

114900 Commits

Author SHA1 Message Date
Richard Henderson 62fe57c6d2 target/ppc: Split out helper_dbczl for 970
We can determine at translation time whether the insn is or
is not dbczl.  We must retain a runtime check against the
HID5 register, but we can move that to a separate function
that never affects other ppc models.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-23 10:56:16 +10:00
Richard Henderson 521a80d895 target/ppc: Hoist dcbz_size out of dcbz_common
The 970 logic does not apply to dcbzep, which is an e500 insn.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-23 10:56:16 +10:00
BALATON Zoltan 73a93ae5f4 target/ppc/mem_helper.c: Remove a conditional from dcbz_common()
Instead of passing a bool and select a value within dcbz_common() let
the callers pass in the right value to avoid this conditional
statement. On PPC dcbz is often used to zero memory and some code uses
it a lot. This change improves the run time of a test case that copies
memory with a dcbz call in every iteration from 6.23 to 5.83 seconds.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <20240622204833.5F7C74E6000@zero.eik.bme.hu>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-23 10:56:16 +10:00
Richard Henderson 3b9991e35c target/arm: Use set/clear_helper_retaddr in SVE and SME helpers
Avoid a race condition with munmap in another thread.
Use around blocks that exclusively use "host_fn".
Keep the blocks as small as possible, but without setting
and clearing for every operation on one page.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-23 10:56:04 +10:00
Richard Henderson 8009519b30 target/arm: Use set/clear_helper_retaddr in helper-a64.c
Use these in helper_dc_dva and the FEAT_MOPS routines to
avoid a race condition with munmap in another thread.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-23 10:56:04 +10:00
Richard Henderson 3d75856d1a accel/tcg: Move {set,clear}_helper_retaddr to cpu_ldst.h
Use of these in helpers goes hand-in-hand with tlb_vaddr_to_host
and other probing functions.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-23 10:56:04 +10:00
Wilfred Mallawa 4f947b10d5 hw/nvme: Add SPDM over DOE support
Setup Data Object Exchange (DOE) as an extended capability for the NVME
controller and connect SPDM to it (CMA) to it.

Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Klaus Jensen <k.jensen@samsung.com>
Message-Id: <20240703092027.644758-4-alistair.francis@wdc.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:42 -04:00
Huai-Cheng Kuo bc419a1cc5 backends: Initial support for SPDM socket support
SPDM enables authentication, attestation and key exchange to assist in
providing infrastructure security enablement. It's a standard published
by the DMTF [1].

SPDM supports multiple transports, including PCIe DOE and MCTP.
This patch adds support to QEMU to connect to an external SPDM
instance.

SPDM support can be added to any QEMU device by exposing a
TCP socket to a SPDM server. The server can then implement the SPDM
decoding/encoding support, generally using libspdm [2].

This is similar to how the current TPM implementation works and means
that the heavy lifting of setting up certificate chains, capabilities,
measurements and complex crypto can be done outside QEMU by a well
supported and tested library.

1: https://www.dmtf.org/standards/SPDM
2: https://github.com/DMTF/libspdm

Signed-off-by: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
Signed-off-by: Chris Browy <cbrowy@avery-design.com>
Co-developed-by: Jonathan Cameron <Jonathan.cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
[ Changes by WM
 - Bug fixes from testing
]
Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
[ Changes by AF:
 - Convert to be more QEMU-ified
 - Move to backends as it isn't PCIe specific
]
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20240703092027.644758-3-alistair.francis@wdc.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:42 -04:00
Alistair Francis 78cc8c6947 hw/pci: Add all Data Object Types defined in PCIe r6.0
Add all of the defined protocols/features from the PCIe-SIG r6.0
"Table 6-32 PCI-SIG defined Data Object Types (Vendor ID = 0001h)"
table.

Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Message-Id: <20240703092027.644758-2-alistair.francis@wdc.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:42 -04:00
Sunil V L e9c0d54f4a tests/acpi: Add expected ACPI AML files for RISC-V
As per the step 5 in the process documented in bios-tables-test.c,
generate the expected ACPI AML data files for RISC-V using the
rebuild-expected-aml.sh script and update the
bios-tables-test-allowed-diff.h.

These are all new files being added for the first time. Hence, iASL diff
output is not added.

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240716144306.2432257-10-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:42 -04:00
Sunil V L 5b966e548f tests/qtest/bios-tables-test.c: Enable basic testing for RISC-V
Add basic ACPI table test case for RISC-V.

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240716144306.2432257-9-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:42 -04:00
Sunil V L cc3ba24225 tests/acpi: Add empty ACPI data files for RISC-V
As per process documented (steps 1-3) in bios-tables-test.c, add empty
AML data files for RISC-V ACPI tables and add the entries in
bios-tables-test-allowed-diff.h.

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240716144306.2432257-8-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:42 -04:00
Sunil V L 329b327924 tests/qtest/bios-tables-test.c: Remove the fall back path
The expected ACPI AML files are moved now under ${arch}/{machine} path.
Hence, there is no need to search in old path which didn't have ${arch}.
Remove the code which searches for the expected AML files under old path
as well.

Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240716144306.2432257-7-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:42 -04:00
Sunil V L 0af3dfa5c5 tests/acpi: update expected DSDT blob for aarch64 and microvm
After PCI link devices are moved out of the scope of PCI root complex,
the DSDT files of machines which use GPEX, will change. So, update the
expected AML files with these changes for these machines.

Mainly, there are 2 changes.

1) Since the link devices are created now directly under _SB for all PCI
root bridges in the system, they should have unique names. So, instead
of GSIx, named those devices as LXXY where L means link, XX will have
PCI bus number and Y will have the INTx number (ex: L000 or L001). The
_PRT entries will also be updated to reflect this name change.

2) PCI link devices are moved from the scope of each PCI root bridge to
directly under _SB.

Below is the sample iASL difference for one such link device.

Scope (\_SB)
{
    Name (_HID, "LNRO0005")  // _HID: Hardware ID
    Name (_UID, 0x1F)  // _UID: Unique ID
    Name (_CCA, One)  // _CCA: Cache Coherency Attribute
    Name (_CRS, ResourceTemplate ()  // _CRS: Current Resource Settings
    {
        Memory32Fixed (ReadWrite,
            0x0A003E00,         // Address Base
            0x00000200,         // Address Length
            )
        Interrupt (ResourceConsumer, Level, ActiveHigh, Exclusive, ,, )
        {
            0x0000004F,
        }
    })

+   Device (L000)
+   {
+       Name (_HID, "PNP0C0F" /* PCI Interrupt Link Device */)
+       Name (_UID, Zero)  // _UID: Unique ID
+       Name (_PRS, ResourceTemplate ()
+       {
+           Interrupt (ResourceConsumer, Level, ActiveHigh, Exclusive, ,, )
+           {
+               0x00000023,
+           }
+       })
+       Name (_CRS, ResourceTemplate ()
+       {
+           Interrupt (ResourceConsumer, Level, ActiveHigh, Exclusive, ,, )
+           {
+               0x00000023,
+           }
+       })
+       Method (_SRS, 1, NotSerialized)  // _SRS: Set Resource Settings
+       {
+       }
+   }
+
      Device (PCI0)
      {
          Name (_HID, "PNP0A08" /* PCI Express Bus */)  // _HID: Hardware ID
          Name (_CID, "PNP0A03" /* PCI Bus */)  // _CID: Compatible ID
          Name (_SEG, Zero)  // _SEG: PCI Segment
          Name (_BBN, Zero)  // _BBN: BIOS Bus Number
          Name (_UID, Zero)  // _UID: Unique ID
          Name (_STR, Unicode ("PCIe 0 Device"))  // _STR: Description String
          Name (_CCA, One)  // _CCA: Cache Coherency Attribute
          Name (_PRT, Package (0x80)  // _PRT: PCI Routing Table
          {

              Package (0x04)
              {
                  0xFFFF,
                  Zero,
-                 GSI0,
+                 L000,
                  Zero
              },

               .....

          })

          Device (GSI0)
          {
              Name (_HID, "PNP0C0F" /* PCI Interrupt Link Device */)
              Name (_UID, Zero)  // _UID: Unique ID
              Name (_PRS, ResourceTemplate ()
              {
                 Interrupt (ResourceConsumer, Level, ActiveHigh, Exclusive, ,, )
                 {
                     0x00000023,
                 }
              })
              Name (_CRS, ResourceTemplate ()
              {
                 Interrupt (ResourceConsumer, Level, ActiveHigh, Exclusive, ,, )
                 {
                     0x00000023,
                 }
              })
              Method (_SRS, 1, NotSerialized)  // _SRS: Set Resource Settings
              {
              }
          }
      }
}

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Message-Id: <20240716144306.2432257-6-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:42 -04:00
Sunil V L 35520bc702 acpi/gpex: Create PCI link devices outside PCI root bridge
Currently, PCI link devices (PNP0C0F) are always created within the
scope of the PCI root bridge. However, RISC-V needs these link devices
to be created outside to ensure the probing order in the OS. This
matches the example given in the ACPI specification [1] as well. Hence,
create these link devices directly under _SB instead of under the PCI
root bridge.

To keep these link device names unique for multiple PCI bridges, change
the device name from GSIx to LXXY format where XX is the PCI bus number
and Y is the INTx.

GPEX is currently used by riscv, aarch64/virt and x86/microvm machines.
So, this change will alter the DSDT for those systems.

[1] - ACPI 5.1: 6.2.13.1 Example: Using _PRT to Describe PCI IRQ Routing

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240716144306.2432257-5-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:42 -04:00
Sunil V L af09c25199 tests/acpi: Allow DSDT acpi table changes for aarch64
so that CI tests don't fail when those ACPI tables are updated in the
next patch. This is as per the documentation in bios-tables-tests.c.

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240716144306.2432257-4-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:42 -04:00
Sunil V L faacd2e6b6 hw/riscv/virt-acpi-build.c: Update the HID of RISC-V UART
The requirement ACPI_060 in the RISC-V BRS specification [1], requires
NS16550 compatible UART to have the HID RSCV0003. So, update the HID for
the UART.

[1] -  https://github.com/riscv-non-isa/riscv-brs/releases/download/v0.0.2/riscv-brs-spec.pdf
       (Chapter 6)

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240716144306.2432257-3-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:42 -04:00
Sunil V L a54dd0cd6b hw/riscv/virt-acpi-build.c: Add namespace devices for PLIC and APLIC
As per the requirement ACPI_080 in the RISC-V Boot and Runtime Services
(BRS) specification [1],  PLIC and APLIC should be in namespace as well.
So, add them using the defined HID.

[1] - https://github.com/riscv-non-isa/riscv-brs/releases/download/v0.0.2/riscv-brs-spec.pdf
      (Chapter 6)

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240716144306.2432257-2-sunilvl@ventanamicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:42 -04:00
Eric Auger 6c027a9de3 virtio-iommu: Add trace point on virtio_iommu_detach_endpoint_from_domain
Add a trace point on virtio_iommu_detach_endpoint_from_domain().

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20240716094619.1713905-7-eric.auger@redhat.com>
Tested-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:42 -04:00
Eric Auger a6586419a1 hw/vfio/common: Add vfio_listener_region_del_iommu trace event
Trace when VFIO gets notified about the deletion of an IOMMU MR.
Also trace the name of the region in the add_iommu trace message.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20240716094619.1713905-6-eric.auger@redhat.com>
Tested-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:42 -04:00
Eric Auger 1993d634d5 virtio-iommu: Remove the end point on detach
We currently miss the removal of the endpoint in case of detach.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20240716094619.1713905-5-eric.auger@redhat.com>
Tested-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:42 -04:00
Eric Auger 62ac01d1de virtio-iommu: Free [host_]resv_ranges on unset_iommu_devices
We are currently missing the deallocation of the [host_]resv_regions
in case of hot unplug. Also to make things more simple let's rule
out the case where multiple HostIOMMUDevices would be aliased and
attached to the same IOMMUDevice. This allows to remove the handling
of conflicting Host reserved regions. Anyway this is not properly
supported at guest kernel level. On hotunplug the reserved regions
are reset to the ones set by virtio-iommu property.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20240716094619.1713905-4-eric.auger@redhat.com>
Tested-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:42 -04:00
Eric Auger 3745768918 virtio-iommu: Remove probe_done
Now we have switched to PCIIOMMUOps to convey host IOMMU information,
the host reserved regions are transmitted when the PCIe topology is
built. This happens way before the virtio-iommu driver calls the probe
request. So let's remove the probe_done flag that allowed to check
the probe was not done before the IOMMU MR got enabled. Besides this
probe_done flag had a flaw wrt migration since it was not saved/restored.

The only case at risk is if 2 devices were plugged to a
PCIe to PCI bridge and thus aliased. First of all we
discovered in the past this case was not properly supported for
neither SMMU nor virtio-iommu on guest kernel side: see

[RFC] virtio-iommu: Take into account possible aliasing in virtio_iommu_mr()
https://lore.kernel.org/all/20230116124709.793084-1-eric.auger@redhat.com/

If this were supported by the guest kernel, it is unclear what the call
sequence would be from a virtio-iommu driver point of view.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20240716094619.1713905-3-eric.auger@redhat.com>
Tested-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:42 -04:00
Eric Auger 935c391418 Revert "virtio-iommu: Clear IOMMUDevice when VFIO device is unplugged"
This reverts commit 1b889d6e39.
There are different problems with that tentative fix:
- Some resources are left dangling (resv_regions,
  host_resv_ranges) and memory subregions are left attached to
  the root MR although freed as embedded in the sdev IOMMUDevice.
  Finally the sdev->as is not destroyed and associated listeners
  are left.
- Even when fixing the above we observe a memory corruption
  associated with the deallocation of the IOMMUDevice. This can
  be observed when a VFIO device is hotplugged, hot-unplugged
  and a system reset is issued. At this stage we have not been
  able to identify the root cause (IOMMU MR or as structs beeing
  overwritten and used later on?).
- Another issue is HostIOMMUDevice are indexed by non aliased
  BDF whereas the IOMMUDevice is indexed by aliased BDF - yes the
  current naming is really misleading -. Given the state of the
  code I don't think the virtio-iommu device works in non
  singleton group case though.

So let's revert the patch for now. This means the IOMMU MR/as survive
the hotunplug. This is what is done in the intel_iommu for instance.
It does not sound very logical to keep those but currently there is
no symetric function to pci_device_iommu_address_space().

probe_done issue will be handled in a subsequent patch. Also
resv_regions and host_resv_regions will be deallocated separately.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20240716094619.1713905-2-eric.auger@redhat.com>
Tested-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:41 -04:00
Salil Mehta 242da18082 gdbstub: Add helper function to unregister GDB register space
Add common function to help unregister the GDB register space. This shall be
done in context to the CPU unrealization.

Note: These are common functions exported to arch specific code. For example,
for ARM this code is being referred in associated arch specific patch-set:

Link: https://lore.kernel.org/qemu-devel/20230926103654.34424-1-salil.mehta@huawei.com/

Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Xianglai Li <lixianglai@loongson.cn>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Reviewed-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Zhao Liu <zhao1.liu@intel.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240716111502.202344-8-salil.mehta@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:41 -04:00
Salil Mehta 24bec42f3d physmem: Add helper function to destroy CPU AddressSpace
Virtual CPU Hot-unplug leads to unrealization of a CPU object. This also
involves destruction of the CPU AddressSpace. Add common function to help
destroy the CPU AddressSpace.

Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Xianglai Li <lixianglai@loongson.cn>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Tested-by: Zhao Liu <zhao1.liu@intel.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240716111502.202344-7-salil.mehta@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:41 -04:00
Salil Mehta efdb43b831 hw/acpi: Update CPUs AML with cpu-(ctrl)dev change
CPUs Control device(\\_SB.PCI0) register interface for the x86 arch is IO port
based and existing CPUs AML code assumes _CRS objects would evaluate to a system
resource which describes IO Port address. But on ARM arch CPUs control
device(\\_SB.PRES) register interface is memory-mapped hence _CRS object should
evaluate to system resource which describes memory-mapped base address. Update
build CPUs AML function to accept both IO/MEMORY region spaces and accordingly
update the _CRS object.

Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Xianglai Li <lixianglai@loongson.cn>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Tested-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240716111502.202344-6-salil.mehta@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:41 -04:00
Salil Mehta 549c9a9dcb hw/acpi: Update GED _EVT method AML with CPU scan
OSPM evaluates _EVT method to map the event. The CPU hotplug event eventually
results in start of the CPU scan. Scan figures out the CPU and the kind of
event(plug/unplug) and notifies it back to the guest. Update the GED AML _EVT
method with the call to method \\_SB.CPUS.CSCN (via \\_SB.GED.CSCN)

Architecture specific code [1] might initialize its CPUs AML code by calling
common function build_cpus_aml() like below for ARM:

build_cpus_aml(scope, ms, opts, xx_madt_cpu_entry, memmap[VIRT_CPUHP_ACPI].base,
               "\\_SB", "\\_SB.GED.CSCN", AML_SYSTEM_MEMORY);

[1] https://lore.kernel.org/qemu-devel/20240613233639.202896-13-salil.mehta@huawei.com/

Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Xianglai Li <lixianglai@loongson.cn>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Tested-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240716111502.202344-5-salil.mehta@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:41 -04:00
Salil Mehta 06f1f4958b hw/acpi: Update ACPI GED framework to support vCPU Hotplug
ACPI GED (as described in the ACPI 6.4 spec) uses an interrupt listed in the
_CRS object of GED to intimate OSPM about an event. Later then demultiplexes the
notified event by evaluating ACPI _EVT method to know the type of event. Use
ACPI GED to also notify the guest kernel about any CPU hot(un)plug events.

Note, GED interface is used by many hotplug events like memory hotplug, NVDIMM
hotplug and non-hotplug events like system power down event. Each of these can
be selected using a bit in the 32 bit GED IO interface. A bit has been reserved
for the CPU hotplug event.

ACPI CPU hotplug related initialization should only happen if ACPI_CPU_HOTPLUG
support has been enabled for particular architecture. Add cpu_hotplug_hw_init()
stub to avoid compilation break.

Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Xianglai Li <lixianglai@loongson.cn>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20240716111502.202344-4-salil.mehta@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
2024-07-22 20:15:41 -04:00
Salil Mehta 2f1a85daf3 hw/acpi: Move CPU ctrl-dev MMIO region len macro to common header file
CPU ctrl-dev MMIO region length could be used in ACPI GED and various other
architecture specific places. Move ACPI_CPU_HOTPLUG_REG_LEN macro to more
appropriate common header file.

Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Xianglai Li <lixianglai@loongson.cn>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Tested-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240716111502.202344-3-salil.mehta@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:41 -04:00
Salil Mehta 08c3286822 accel/kvm: Extract common KVM vCPU {creation,parking} code
KVM vCPU creation is done once during the vCPU realization when Qemu vCPU thread
is spawned. This is common to all the architectures as of now.

Hot-unplug of vCPU results in destruction of the vCPU object in QOM but the
corresponding KVM vCPU object in the Host KVM is not destroyed as KVM doesn't
support vCPU removal. Therefore, its representative KVM vCPU object/context in
Qemu is parked.

Refactor architecture common logic so that some APIs could be reused by vCPU
Hotplug code of some architectures likes ARM, Loongson etc. Update new/old APIs
with trace events. New APIs qemu_{create,park,unpark}_vcpu() can be externally
called. No functional change is intended here.

Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Xianglai Li <lixianglai@loongson.cn>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Reviewed-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240716111502.202344-2-salil.mehta@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:41 -04:00
Igor Mammedov 62f182c97b smbios: make memory device size configurable per Machine
Currently QEMU describes initial[1] RAM* in SMBIOS as a series of
virtual DIMMs (capped at 16Gb max) using type 17 structure entries.

Which is fine for the most cases.  However when starting guest
with terabytes of RAM this leads to too many memory device
structures, which eventually upsets linux kernel as it reserves
only 64K for these entries and when that border is crossed out
it runs out of reserved memory.

Instead of partitioning initial RAM on 16Gb DIMMs, use maximum
possible chunk size that SMBIOS spec allows[2]. Which lets
encode RAM in lower 31 bits of 32bit field (which amounts upto
2047Tb per DIMM).
As result initial RAM will generate only one type 17 structure
until host/guest reach ability to use more RAM in the future.

Compat changes:
We can't unconditionally change chunk size as it will break
QEMU<->guest ABI (and migration). Thus introduce a new machine
class field that would let older versioned machines to use
legacy 16Gb chunks, while new(er) machine type[s] use maximum
possible chunk size.

PS:
While it might seem to be risky to rise max entry size this large
(much beyond of what current physical RAM modules support),
I'd not expect it causing much issues, modulo uncovering bugs
in software running within guest. And those should be fixed
on guest side to handle SMBIOS spec properly, especially if
guest is expected to support so huge RAM configs.

In worst case, QEMU can reduce chunk size later if we would
care enough about introducing a workaround for some 'unfixable'
guest OS, either by fixing up the next machine type or
giving users a CLI option to customize it.

1) Initial RAM - is RAM configured with help '-m SIZE' CLI option/
   implicitly defined by machine. It doesn't include memory
   configured with help of '-device' option[s] (pcdimm,nvdimm,...)
2) SMBIOS 3.1.0 7.18.5 Memory Device — Extended Size

PS:
* tested on 8Tb host with RHEL6 guest, which seems to parse
  type 17 SMBIOS table entries correctly (according to 'dmidecode').

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240715122417.4059293-1-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:41 -04:00
Akihiko Odaki d6f40c95b3 docs: Document composable SR-IOV device
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20240715-sriov-v5-8-3f5539093ffc@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:41 -04:00
Akihiko Odaki c2d6db6a1f virtio-net: Implement SR-IOV VF
A virtio-net device can be added as a SR-IOV VF to another virtio-pci
device that will be the PF.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20240715-sriov-v5-7-3f5539093ffc@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:41 -04:00
Akihiko Odaki 3f868ffb0b virtio-pci: Implement SR-IOV PF
Allow user to attach SR-IOV VF to a virtio-pci PF.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20240715-sriov-v5-6-3f5539093ffc@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:41 -04:00
Akihiko Odaki 122173a583 pcie_sriov: Allow user to create SR-IOV device
A user can create a SR-IOV device by specifying the PF with the
sriov-pf property of the VFs. The VFs must be added before the PF.

A user-creatable VF must have PCIDeviceClass::sriov_vf_user_creatable
set. Such a VF cannot refer to the PF because it is created before the
PF.

A PF that user-creatable VFs can be attached calls
pcie_sriov_pf_init_from_user_created_vfs() during realization and
pcie_sriov_pf_exit() when exiting.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20240715-sriov-v5-5-3f5539093ffc@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:41 -04:00
Akihiko Odaki 47cc753e50 pcie_sriov: Check PCI Express for SR-IOV PF
SR-IOV requires PCI Express.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20240715-sriov-v5-4-3f5539093ffc@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:41 -04:00
Akihiko Odaki 78f9d7fd19 pcie_sriov: Ensure PF and VF are mutually exclusive
A device cannot be a SR-IOV PF and a VF at the same time.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20240715-sriov-v5-3-3f5539093ffc@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:41 -04:00
Akihiko Odaki ca6dd3aef8 hw/pci: Fix SR-IOV VF number calculation
pci_config_get_bar_addr() had a division by vf_stride. vf_stride needs
to be non-zero when there are multiple VFs, but the specification does
not prohibit to make it zero when there is only one VF.

Do not perform the division for the first VF to avoid division by zero.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20240715-sriov-v5-2-3f5539093ffc@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:41 -04:00
Richard Henderson d92cf77b79 * Minor clean-ups and fixes for the qtests and Avocado tests
* Fix crash that happens when introspecting scsi-block on older machine types
 * s390x: filter deprecated properties based on model expansion type
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmaeSUMRHHRodXRoQHJl
 ZGhhdC5jb20ACgkQLtnXdP5wLbVdQw/8DvGymXKwpS0F2aSHg3AZvjSCpkv3Y+fK
 myQrzh30cv9Vhe/Y9do47HpfJ6Ug9SK6xG64K2o+BIW+G3+ZUwSHk24PoiALsrJf
 9qqya1upBJkEC5B4PhqRPS3GlbvBnKKEk8W6BMpUa2BToFV9MsG256cBVhUrRpGc
 6u80DgTNxCI1czsNkWVGJAt1oVLYYJIjz7UZ4VbZCH48o6r0iSUV6C01wccOFmNy
 IXbspyyUftWFh9lO0i8PiYlXG2YEAmFry3gqD5vc+6BsFT4lMeoRFFxbVCddGKFc
 iNwlH4ayjeISlEJeClImIdbHyZ+sDhPyy5x4cpQqmZudEPn+GVnZ0arm7OvXW/k8
 Yog4n7/cUz7GHnWbqYIFZMS1g1wmqm/9VPsVTzXAlTva4dTTs2p0tKAADHIAtPCI
 jxSPpbuCuukDzUZGsNZyRGbex6g4B0tP4TMHRFxo5LVy9dKn2BLOHBWuzPevD9OO
 FphZHUuGngcPi4GSFmlv7aCS0pqyWsCO+5EqoYUgO8yadyfiXN9pwjB6OnBZux0U
 kbJOkkBJwEalhsiHmPFMnS8rkWa4Ye4ZJjj8XHRiecxSZOcNOcxyE+l2x8CV2aFB
 UBR83nm86vXXpu86Yod3E+txDEUzKN5+B8X0q7Se0YvsWbB+1Dq/Co0Bdh/Wp70E
 EPk5eqaSp8k=
 =zB5F
 -----END PGP SIGNATURE-----

Merge tag 'pull-request-2024-07-22' of https://gitlab.com/thuth/qemu into staging

* Minor clean-ups and fixes for the qtests and Avocado tests
* Fix crash that happens when introspecting scsi-block on older machine types
* s390x: filter deprecated properties based on model expansion type

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmaeSUMRHHRodXRoQHJl
# ZGhhdC5jb20ACgkQLtnXdP5wLbVdQw/8DvGymXKwpS0F2aSHg3AZvjSCpkv3Y+fK
# myQrzh30cv9Vhe/Y9do47HpfJ6Ug9SK6xG64K2o+BIW+G3+ZUwSHk24PoiALsrJf
# 9qqya1upBJkEC5B4PhqRPS3GlbvBnKKEk8W6BMpUa2BToFV9MsG256cBVhUrRpGc
# 6u80DgTNxCI1czsNkWVGJAt1oVLYYJIjz7UZ4VbZCH48o6r0iSUV6C01wccOFmNy
# IXbspyyUftWFh9lO0i8PiYlXG2YEAmFry3gqD5vc+6BsFT4lMeoRFFxbVCddGKFc
# iNwlH4ayjeISlEJeClImIdbHyZ+sDhPyy5x4cpQqmZudEPn+GVnZ0arm7OvXW/k8
# Yog4n7/cUz7GHnWbqYIFZMS1g1wmqm/9VPsVTzXAlTva4dTTs2p0tKAADHIAtPCI
# jxSPpbuCuukDzUZGsNZyRGbex6g4B0tP4TMHRFxo5LVy9dKn2BLOHBWuzPevD9OO
# FphZHUuGngcPi4GSFmlv7aCS0pqyWsCO+5EqoYUgO8yadyfiXN9pwjB6OnBZux0U
# kbJOkkBJwEalhsiHmPFMnS8rkWa4Ye4ZJjj8XHRiecxSZOcNOcxyE+l2x8CV2aFB
# UBR83nm86vXXpu86Yod3E+txDEUzKN5+B8X0q7Se0YvsWbB+1Dq/Co0Bdh/Wp70E
# EPk5eqaSp8k=
# =zB5F
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 22 Jul 2024 09:57:55 PM AEST
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg:                 aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>" [full]

* tag 'pull-request-2024-07-22' of https://gitlab.com/thuth/qemu:
  target/s390x: filter deprecated properties based on model expansion type
  tests: increase timeout per instance of bios-tables-test
  qtest/fuzz: make range overlap check more readable
  hw: Fix crash that happens when introspecting scsi-block on older machine types
  tests/avocado/machine_aspeed.py: Increase timeout for TPM test
  tests/avocado: Remove the remainders of the virtiofs_submounts test
  tests/avocado/mem-addr-space-check: Remove unused "import signal"
  tests/avocado: Move LinuxTest related code into a separate file
  tests/avocado: Allow overwriting AVOCADO_SHOW env variable
  tests/avocado/boot_xen.py: use class attribute
  tests/avocado/boot_xen.py: unify tags
  tests/avocado/boot_xen.py: merge base classes

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-23 08:31:21 +10:00
songziming 903cc9e117 chardev/char-win-stdio.c: restore old console mode
If I use `-serial stdio` on Windows, after QEMU exits, the terminal
could not handle arrow keys and tab any more. Because stdio backend
on Windows sets console mode to virtual terminal input when starts,
but does not restore the old mode when finalize.

This small patch saves the old console mode and set it back.

Signed-off-by: Ziming Song <s.ziming@hotmail.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <ME3P282MB25488BE7C39BF0C35CD0DA5D8CA82@ME3P282MB2548.AUSP282.PROD.OUTLOOK.COM>
2024-07-22 22:25:46 +04:00
Paolo Bonzini 7c912ffb59 hpet: avoid timer storms on periodic timers
If the period is set to a value that is too low, there could be no
time left to run the rest of QEMU.  Do not trigger interrupts faster
than 1 MHz.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22 19:19:44 +02:00
Paolo Bonzini 242d665396 hpet: store full 64-bit target value of the counter
Store the full 64-bit value at which the timer should fire.

This makes it possible to skip the imprecise hpet_calculate_diff()
step, and to remove the clamping of the period to 31 or 63 bits.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22 19:19:44 +02:00
Paolo Bonzini c236656737 hpet: accept 64-bit reads and writes
Declare the MemoryRegionOps so that 64-bit reads and writes to the HPET
are received directly.  This makes it possible to unify the code to
process low and high parts: for 32-bit reads, extract the desired word;
for 32-bit writes, just merge the desired part into the old value and
proceed as with a 64-bit write.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22 19:19:44 +02:00
Paolo Bonzini ba88935b0f hpet: place read-only bits directly in "new_val"
The variable "val" is used for two different purposes.  As an intermediate
value when writing configuration registers, and to store the cleared bits
when writing ISR.

Use "new_val" for the former, and rename the variable so that it is clearer
for the latter case.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22 19:19:44 +02:00
Paolo Bonzini 5895879aca hpet: remove unnecessary variable "index"
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22 19:19:44 +02:00
Paolo Bonzini 9eb7fad354 hpet: ignore high bits of comparator in 32-bit mode
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22 19:19:44 +02:00
Paolo Bonzini f0ccf77078 hpet: fix and cleanup persistence of interrupt status
There are several bugs in the handling of the ISR register:

- switching level->edge was not lowering the interrupt and
  clearing ISR

- switching on the enable bit was not raising a level-triggered
  interrupt if the timer had fired

- the timer must be kept running even if not enabled, in
  order to set the ISR flag, so writes to HPET_TN_CFG must
  not call hpet_del_timer()

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22 19:19:44 +02:00
Anthony Harivel 0418f90809 Add support for RAPL MSRs in KVM/Qemu
Starting with the "Sandy Bridge" generation, Intel CPUs provide a RAPL
interface (Running Average Power Limit) for advertising the accumulated
energy consumption of various power domains (e.g. CPU packages, DRAM,
etc.).

The consumption is reported via MSRs (model specific registers) like
MSR_PKG_ENERGY_STATUS for the CPU package power domain. These MSRs are
64 bits registers that represent the accumulated energy consumption in
micro Joules. They are updated by microcode every ~1ms.

For now, KVM always returns 0 when the guest requests the value of
these MSRs. Use the KVM MSR filtering mechanism to allow QEMU handle
these MSRs dynamically in userspace.

To limit the amount of system calls for every MSR call, create a new
thread in QEMU that updates the "virtual" MSR values asynchronously.

Each vCPU has its own vMSR to reflect the independence of vCPUs. The
thread updates the vMSR values with the ratio of energy consumed of
the whole physical CPU package the vCPU thread runs on and the
thread's utime and stime values.

All other non-vCPU threads are also taken into account. Their energy
consumption is evenly distributed among all vCPUs threads running on
the same physical CPU package.

To overcome the problem that reading the RAPL MSR requires priviliged
access, a socket communication between QEMU and the qemu-vmsr-helper is
mandatory. You can specified the socket path in the parameter.

This feature is activated with -accel kvm,rapl=true,path=/path/sock.sock

Actual limitation:
- Works only on Intel host CPU because AMD CPUs are using different MSR
  adresses.

- Only the Package Power-Plane (MSR_PKG_ENERGY_STATUS) is reported at
  the moment.

Signed-off-by: Anthony Harivel <aharivel@redhat.com>
Link: https://lore.kernel.org/r/20240522153453.1230389-4-aharivel@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22 19:19:37 +02:00
Yao Xingtao 4ea3de93a3 hw/nvme: remove useless type cast
The type of req->cmd is NvmeCmd, cast the pointer of this type to
NvmeCmd* is useless.

Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2024-07-22 14:43:17 +02:00