Commit Graph

118872 Commits

Author SHA1 Message Date
Steve Sistare 7fd0224457 physmem: fix qemu_ram_alloc_from_fd size calculation
qemu_ram_alloc_from_fd allocates space if file_size == 0.  If non-zero,
it uses the existing space and verifies it is large enough, but the
verification was broken when the offset parameter was introduced.  As
a result, a file smaller than offset passes the verification and causes
errors later.  Fix that, and update the error message to include offset.

Peter provides this concise reproducer:

  $ touch ramfile
  $ truncate -s 64M ramfile
  $ ./qemu-system-x86_64 -object memory-backend-file,mem-path=./ramfile,offset=128M,size=128M,id=mem1,prealloc=on
  qemu-system-x86_64: qemu_prealloc_mem: preallocating memory failed: Bad address

With the fix, the error message is:
  qemu-system-x86_64: mem1 backing store size 0x4000000 is too small for 'size' option 0x8000000 plus 'offset' option 0x8000000

Cc: qemu-stable@nongnu.org
Fixes: 4b870dc4d0 ("hostmem-file: add offset option")
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-3-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit 719168fba7c3215cc996dcfd32a6e5e9c7b8eee0)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-02-01 11:56:57 +03:00
Hongren Zheng e82fbf01b6 hw/usb/canokey: Fix buffer overflow for OUT packet
When USBPacket in OUT direction has larger payload
than the ep_out_buffer (of size 512), a buffer overflow
would occur.

It could be fixed by limiting the size of usb_packet_copy
to be at most buffer size. Further optimization gets rid
of the ep_out_buffer and directly uses ep_out as the target
buffer.

This is reported by a security researcher who artificially
constructed an OUT packet of size 2047. The report has gone
through the QEMU security process, and as this device is for
testing purpose and no deployment of it in virtualization
environment is observed, it is triaged not to be a security bug.

Cc: qemu-stable@nongnu.org
Fixes: d7d3491855 ("hw/usb: Add CanoKey Implementation")
Reported-by: Juan Jose Lopez Jaimez <thatjiaozi@gmail.com>
Signed-off-by: Hongren Zheng <i@zenithal.me>
Message-id: Z4TfMOrZz6IQYl_h@Sun
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 664280abddcb3cacc9c6204706bb739fcc1316f7)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-29 22:29:03 +03:00
Peter Maydell c806bbe8c1 target/arm: arm_reset_sve_state() should set FPSR, not FPCR
The pseudocode ResetSVEState() does:
    FPSR = ZeroExtend(0x0800009f<31:0>, 64);
but QEMU's arm_reset_sve_state() called vfp_set_fpcr() by accident.

Before the advent of FEAT_AFP, this was only setting a collection of
RES0 bits, which vfp_set_fpsr() would then ignore, so the only effect
was that we didn't actually set the FPSR the way we are supposed to
do.  Once FEAT_AFP is implemented, setting the bottom bits of FPSR
will change the floating point behaviour.

Call vfp_set_fpsr(), as we ought to.

(Note for stable backports: commit 7f2a01e736 moved this function
from sme_helper.c to helper.c, but it had the same bug before the
move too.)

Cc: qemu-stable@nongnu.org
Fixes: f84734b874 ("target/arm: Implement SMSTART, SMSTOP")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-4-peter.maydell@linaro.org
(cherry picked from commit 1edc3d43f20df0d04f8d00b906ba19fed37512a5)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-29 22:25:22 +03:00
Zhao Liu c597e6f26d stub: Fix build failure with --enable-user --disable-system --enable-tools
Configuring "--enable-user --disable-system --enable-tools" causes the
build failure with the following information:

/usr/bin/ld: libhwcore.a.p/hw_core_qdev.c.o: in function `device_finalize':
/qemu/build/../hw/core/qdev.c:688: undefined reference to `qapi_event_send_device_deleted'
collect2: error: ld returned 1 exit status

To fix the above issue, add qdev.c stub when build with `have_tools`.

With this fix, QEMU could be successfully built in the following cases:
 --enable-user --disable-system --enable-tools
 --enable-user --disable-system --disable-tools
 --enable-user --disable-system

Cc: qemu-stable@nongnu.org
Fixes: 388b849fb6 ("stubs: avoid duplicate symbols in libqemuutil.a")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2766
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250121154318.214680-1-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 8113dbbcdaee05f319a7e48272416d918cb2b04a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-29 22:23:55 +03:00
Matt Borgerson 86344a8543 nv2a: Add swizzle test and benchmark 2025-01-26 18:47:46 -07:00
Matt Borgerson 7cb7bb68a9 nv2a: Multiversion [un]swizzle to optimize for common bpp 2025-01-26 18:47:46 -07:00
Matt Borgerson eae328dc19 nv2a: Move [un]swizzle_rect to swizzle.h 2025-01-26 18:47:46 -07:00
Matt Borgerson bb5ee6865b nv2a: Drop osdep.h, add stdbool.h to swizzle.c 2025-01-26 18:47:46 -07:00
NZJenkins ae4b5c0695
nv2a: Speed up software swizzling 2025-01-26 14:00:35 -07:00
Matt Borgerson 61e29a0678 ci: Bump Windows build container 2025-01-26 04:45:03 -07:00
Matt Borgerson 73e1acfd92 scripts/gen-license.py: Get version from glslang subproj 2025-01-26 03:48:38 -07:00
Matt Borgerson fde688f6e2 meson: Remove libglslang windows special dependency 2025-01-26 03:48:38 -07:00
Matt Borgerson 66dbd9273c meson: Bump glslang 2025-01-26 03:48:38 -07:00
Matt Borgerson e0345bae3e meson: Bump SPIRV-Reflect 2025-01-26 03:48:38 -07:00
Matt Borgerson 572f6161bf meson: Bump volk 2025-01-26 03:48:38 -07:00
Matt Borgerson 03e3cd079a ubuntu-win64-cross: Update Vulkan headers 2025-01-26 01:56:57 -07:00
Matt Borgerson f922d08a5b ubuntu-win64-cross: Drop some deps in favor of subproject wraps 2025-01-26 01:56:57 -07:00
Matt Borgerson ec5c3955ee ubuntu-win64-cross: Bump mxe/build-win64-mxe 2025-01-26 01:56:57 -07:00
Fred Hallock 191bc40f70
xid: Add Xbox Controller S 2025-01-25 20:48:58 -07:00
Daniel P. Berrangé dcb80cd908 crypto: fix bogus error benchmarking pbkdf on fast machines
We're seeing periodic reports of errors like:

$ qemu-img create -f luks --object secret,data=123456,id=sec0 \
                  -o key-secret=sec0 luks-info.img 1M
  Formatting 'luks-info.img', fmt=luks size=1048576 key-secret=sec0
  qemu-img: luks-info.img: Unable to get accurate CPU usage

This error message comes from a recent attempt to workaround a
kernel bug with measuring rusage in long running processes:

  commit c72cab5ad9
  Author: Tiago Pasqualini <tiago.pasqualini@canonical.com>
  Date:   Wed Sep 4 20:52:30 2024 -0300

    crypto: run qcrypto_pbkdf2_count_iters in a new thread

Unfortunately this has a subtle bug on machines which are very fast.

On the first time around the loop, the 'iterations' value is quite
small (1 << 15), and so will run quite fast. Testing has shown that
some machines can complete this benchmarking task in as little as
7 milliseconds.

Unfortunately the 'getrusage' data is not updated at the time of
the 'getrusage' call, it is done asynchronously by the scheduler.
The 7 millisecond completion time for the benchmark is short
enough that 'getrusage' sometimes reports 0 accumulated execution
time.

As a result the 'delay_ms == 0' sanity check in the above commit
is triggering non-deterministically on such machines.

The benchmarking loop intended to run multiple times, increasing
the 'iterations' value until the benchmark ran for > 500 ms, but
the sanity check doesn't allow this to happen.

To fix it, we keep a loop counter and only run the sanity check
after we've been around the loop more than 5 times. At that point
the 'iterations' value is high enough that even with infrequent
updates of 'getrusage' accounting data on fast machines, we should
see a non-zero value.

Fixes: https://lore.kernel.org/qemu-devel/ffe542bb-310c-4616-b0ca-13182f849fd1@redhat.com/
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2336437
Reported-by: Thomas Huth <thuth@redhat.com>
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20250109093746.1216300-1-berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 145f12ea885c8fcfbe2d0ac5230630f071b5a9fb)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-22 21:58:16 +03:00
Zhao Liu 2f5f6cb90a i386/cpu: Mark avx10_version filtered when prefix is NULL
In x86_cpu_filter_features(), if host doesn't support AVX10, the
configured avx10_version should be marked as filtered regardless of
whether prefix is NULL or not.

Check prefix before warn_report() instead of checking for
have_filtered_features.

Cc: qemu-stable@nongnu.org
Fixes: commit bccfb846fd ("target/i386: add AVX10 feature and AVX10 version property")
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20241106030728.553238-2-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit cf4c263551886964c5d58bd7b675b13fd497b402)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-17 21:54:51 +03:00
Paolo Bonzini 1032dccadb make-release: only leave tarball of wrap-file subprojects
The QEMU source archive is including the sources downloaded from crates.io
in both tarball form (in subprojects/packagecache) and expanded/patched
form (in the subprojects directory).  The former is the more authoritative
form, as it has a hash that can be verified in the wrap file and checked
against the download URL, so keep that one only.  This works also with
--disable-download; when building QEMU for the first time from the
tarball, Meson will print something like

    Using proc-macro2-1-rs source from cache.

for each subproject, and then go on to extract the tarball and apply the
overlay or the patches in subprojects/packagefiles.

Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2719
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit be27b5149c86f81531f8fc609baf3480fc4d9ca0)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-17 21:54:18 +03:00
Li Zhijian acc4e8b69b hw/cxl: Fix msix_notify: Assertion `vector < dev->msix_entries_nr`
This assertion always happens when we sanitize the CXL memory device.
$ echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize

It is incorrect to register an MSIX number beyond the device's capability.

Increase the device's MSIX number to cover the mailbox msix number(9).

Fixes: 43efb0bfad ("hw/cxl/mbox: Wire up interrupts for background completion")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Message-Id: <20250115075834.167504-1-lizhijian@fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 1ce979e7269a34d19ea1a65808df014d8b2acbf6)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-17 10:57:59 +03:00
Igor Mammedov 73ca3ba37d tests: acpi: update expected blobs
_DSM function 7 AML should have followig change:

               If ((Arg2 == 0x07))
               {
  -                Local0 = Package (0x02)
  -                    {
  -                        Zero,
  -                        ""
  -                    }
                   Local2 = AIDX (DerefOf (Arg4 [Zero]), DerefOf (Arg4 [One]
                       ))
  -                Local0 [Zero] = Local2
  +                Local0 = Package (0x02) {}
  +                If (!((Local2 == Zero) || (Local2 == 0xFFFFFFFF)))
  +                {
  +                    Local0 [Zero] = Local2
  +                    Local0 [One] = ""
  +                }
  +
                   Return (Local0)
               }
           }

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20250115125342.3883374-4-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 9fb1c9a1bb26e111ee5fa5538070cd684de14c08)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: regenerate acpi tables for 9.2)
2025-01-17 09:58:21 +03:00
Igor Mammedov 7170aa66aa pci: acpi: Windows 'PCI Label Id' bug workaround
Current versions of Windows call _DSM(func=7) regardless
of whether it is supported or not. It leads to NICs having bogus
'PCI Label Id = 0', where none should be set at all.

Also presence of 'PCI Label Id' triggers another Windows bug
on localized versions that leads to hangs. The later bug is fixed
in latest updates for 'Windows Server' but not in consumer
versions of Windows (and there is no plans to fix it
as far as I'm aware).

Given it's easy, implement Microsoft suggested workaround
(return invalid Package) so that affected Windows versions
could boot on QEMU.
This would effectvely remove bogus 'PCI Label Id's on NICs,
but MS teem confirmed that flipping 'PCI Label Id' should not
change 'Network Connection' ennumeration, so it should be safe
for QEMU to change _DSM without any compat code.

Smoke tested with WinXP and WS2022
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/774
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20250115125342.3883374-3-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 0b053391985abcc40b16ac8fc4a7f6588d1d95c1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-17 09:30:54 +03:00
Igor Mammedov b107128ea6 tests: acpi: whitelist expected blobs
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20250115125342.3883374-2-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 1ad32644fe4c9fb25086be15a66dde1d55d3410f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-17 09:30:47 +03:00
Nicholas Piggin c8fb662a58 pci/msix: Fix msix pba read vector poll end calculation
The end vector calculation has a bug that results in polling fewer
than required vectors when reading at a non-zero offset in PBA memory.

Fixes: bbef882cc1 ("msi: add API to get notified about pending bit poll")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20241212120402.1475053-1-npiggin@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 42e2a7a0ab23784e44fcb18369e06067abc89305)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-17 09:26:23 +03:00
Sebastian Ott 91b2cb9a78 pci: ensure valid link status bits for downstream ports
PCI hotplug for downstream endpoints on arm fails because Linux'
PCIe hotplug driver doesn't like the QEMU provided LNKSTA:

  pcieport 0000:08:01.0: pciehp: Slot(2): Card present
  pcieport 0000:08:01.0: pciehp: Slot(2): Link Up
  pcieport 0000:08:01.0: pciehp: Slot(2): Cannot train link: status 0x2000

There's 2 cases where LNKSTA isn't setup properly:
* the downstream device has no express capability
* max link width of the bridge is 0

Move the sanity checks added via 88c869198a
("pci: Sanity test minimum downstream LNKSTA") outside of the
branch to make sure downstream ports always have a valid LNKSTA.

Signed-off-by: Sebastian Ott <sebott@redhat.com>
Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
Message-Id: <20241203121928.14861-1-sebott@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 694632fd44987cc4618612a38ad151047524a590)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-17 09:22:13 +03:00
Phil Dennis-Jordan 4a3538e6f2 hw/usb/hcd-xhci-pci: Use modulo to select MSI vector as per spec
QEMU would crash with a failed assertion if the XHCI controller
attempted to raise the interrupt on an interrupter corresponding
to a MSI vector with a higher index than the highest configured
for the device by the guest driver.

This behaviour is correct on the MSI/PCI side: per PCI 3.0 spec,
devices must ensure they do not send MSI notifications for
vectors beyond the range of those allocated by the system/driver
software. Unlike MSI-X, there is no generic way for handling
aliasing in the case of fewer allocated vectors than requested,
so the specifics are up to device implementors. (Section
6.8.3.4. "Sending Messages")

It turns out the XHCI spec (Implementation Note in section 4.17,
"Interrupters") requires that the host controller signal the MSI
vector with the number computed by taking the interrupter number
modulo the number of enabled MSI vectors.

This change introduces that modulo calculation, fixing the
failed assertion. This makes the device work correctly in MSI mode
with macOS's XHCI driver, which only allocates a single vector.

Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250112210056.16658-2-phil@philjordan.eu>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit bb5b7fced6b5d3334ab20702fc846e47bb1fb731)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-17 09:16:01 +03:00
Gabriel Barrantes 69e29c484f backends/cryptodev-vhost-user: Fix local_error leaks
Do not propagate error to the upper, directly output the error
to avoid leaks.

Fixes: 2fda101de0 ("virtio-crypto: Support asynchronous mode")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2714
Signed-off-by: Gabriel Barrantes <gabriel.barrantes.dev@outlook.com>
Reviewed-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <DM8PR13MB50781054A4FDACE6F4FB6469B30F2@DM8PR13MB5078.namprd13.prod.outlook.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 78b0c15a563ac4be5afb0375602ca0a3adc6c442)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-17 09:16:01 +03:00
Philippe Mathieu-Daudé 3b9b5cbe0a tests/qtest/boot-serial-test: Correct HPPA machine name
Commit 7df6f75117 ("hw/hppa: Split out machine creation")
renamed the 'hppa' machine as 'B160L', but forgot to update
the boot serial test, which ended being skipped.

Cc: qemu-stable@nongnu.org
Fixes: 7df6f75117 ("hw/hppa: Split out machine creation")
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20250102100340.43014-2-philmd@linaro.org>
(cherry picked from commit a87077316ed2f1c1c8ba8faf05feed9dbf0f2fee)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-17 09:16:01 +03:00
Keoseong Park 48876bfc47 hw/ufs: Adjust value to match CPU's endian format
In ufs_write_attr_value(), the value parameter is handled in the CPU's
endian format but provided in big-endian format by the caller. Thus, it
is converted to the CPU's endian format. The related test code is also
fixed to reflect this change.

Fixes: 7c85332a2b ("hw/ufs: minor bug fixes related to ufs-test")
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Keoseong Park <keosung.park@samsung.com>
Reviewed-by: Jeuk Kim <jeuk20.kim@samsung.com>
Message-ID: <20250107084356epcms2p2af4d86432174d76ea57336933e46b4c3@epcms2p2>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 4572dacc33e232a7c951ba7ba7a20887fad29e71)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-15 15:53:53 +03:00
Ryan Wendland 5896b9dc91 vl: Enable audio backend 2025-01-15 03:08:14 -07:00
Ryzee119 e293f6ba67 input: Add xbox live communicator support 2025-01-15 03:08:14 -07:00
Matt Borgerson 4f71be78e2 Info.plist: Add NSMicrophoneUsageDescription key
Required to access microphone on macOS.
2025-01-15 03:05:35 -07:00
Matt Borgerson 6f63e3c4af meson: Add SDL to audio driver priority list on Linux 2025-01-15 03:05:35 -07:00
Matt Borgerson 6de26b0c2c ac97: Disable pi/mc reads for now 2025-01-15 03:05:35 -07:00
Erik Abair 0eddb3ead9 build: Allow CFLAGS to be passed through on macos. 2025-01-14 17:13:11 -07:00
Philippe Mathieu-Daudé bb6940dbad tests/functional/test_rx_gdbsim: Use stable URL for test_linux_sash
Yoshinori said [*] URL references on OSDN were stable, but they
appear not to be. Mirror the artifacts on GitHub to avoid failures
while testing on CI.

[*] https://www.mail-archive.com/qemu-devel@nongnu.org/msg686487.html

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Reported-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-ID: <20200630202631.7345-1-f4bug@amsat.org>
[huth: Adapt the patch to the new version in the functional framework]
Message-ID: <20241229083419.180423-1-huth@tuxfamily.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit ec2dfb7c389b94d71ee825caa20b709d5df6c166)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: fixup for missing v9.2.0-421-g65d35a4e27a8 "tests/functional: convert tests to new uncompress helper")
2025-01-13 12:28:59 +03:00
Yuan Liu 9a17a65066 multifd: bugfix for incorrect migration data with qatzip compression
When QPL compression is enabled on the migration channel and the same
dirty page changes from a normal page to a zero page in the iterative
memory copy, the dirty page will not be updated to a zero page again
on the target side, resulting in incorrect memory data on the source
and target sides.

The root cause is that the target side does not record the normal pages
to the receivedmap.

The solution is to add ramblock_recv_bitmap_set_offset in target side
to record the normal pages.

Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Jason Zeng <jason.zeng@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20241218091413.140396-4-yuan1.liu@intel.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit a523bc52166c80d8a04d46584f9f3868bd53ef69)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-13 11:25:57 +03:00
Yuan Liu fcd5a157e6 multifd: bugfix for incorrect migration data with QPL compression
When QPL compression is enabled on the migration channel and the same
dirty page changes from a normal page to a zero page in the iterative
memory copy, the dirty page will not be updated to a zero page again
on the target side, resulting in incorrect memory data on the source
and target sides.

The root cause is that the target side does not record the normal pages
to the receivedmap.

The solution is to add ramblock_recv_bitmap_set_offset in target side
to record the normal pages.

Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Jason Zeng <jason.zeng@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20241218091413.140396-3-yuan1.liu@intel.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit 2588a5f99b0c3493b4690e3ff01ed36f80e830cc)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-13 11:25:57 +03:00
Yuan Liu 7fb9ce40e7 multifd: bugfix for migration using compression methods
When compression is enabled on the migration channel and
the pages processed are all zero pages, these pages will
not be sent and updated on the target side, resulting in
incorrect memory data on the source and target sides.

The root cause is that all compression methods call
multifd_send_prepare_common to determine whether to compress
dirty pages, but multifd_send_prepare_common does not update
the IOV of MultiFDPacket_t when all dirty pages are zero pages.

The solution is to always update the IOV of MultiFDPacket_t
regardless of whether the dirty pages are all zero pages.

Fixes: 303e6f54f9 ("migration/multifd: Implement zero page transmission on the multifd thread.")
Cc: qemu-stable@nongnu.org #9.0+
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Jason Zeng <jason.zeng@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20241218091413.140396-2-yuan1.liu@intel.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit cdc3970f8597ebdc1a4c2090cfb4d11e297329ed)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-13 11:25:57 +03:00
Fabiano Rosas 82565fb6b3 migration: Fix arrays of pointers in JSON writer
Currently, if an array of pointers contains a NULL pointer, that
pointer will be encoded as '0' in the stream. Since the JSON writer
doesn't define a "pointer" type, that '0' will now be an uint8, which
is different from the original type being pointed to, e.g. struct.

(we're further calling uint8 "nullptr", but that's irrelevant to the
issue)

That mixed-type array shouldn't be compressed, otherwise data is lost
as the code currently makes the whole array have the type of the first
element:

css = {NULL, NULL, ..., 0x5555568a7940, NULL};

{"name": "s390_css", "instance_id": 0, "vmsd_name": "s390_css",
 "version": 1, "fields": [
    ...,
    {"name": "css", "array_len": 256, "type": "nullptr", "size": 1},
    ...,
]}

In the above, the valid pointer at position 254 got lost among the
compressed array of nullptr.

While we could disable the array compression when a NULL pointer is
found, the JSON part of the stream still makes part of downtime, so we
should avoid writing unecessary bytes to it.

Keep the array compression in place, but if NULL and non-NULL pointers
are mixed break the array into several type-contiguous pieces :

css = {NULL, NULL, ..., 0x5555568a7940, NULL};

{"name": "s390_css", "instance_id": 0, "vmsd_name": "s390_css",
 "version": 1, "fields": [
     ...,
     {"name": "css", "array_len": 254, "type": "nullptr", "size": 1},
     {"name": "css", "type": "struct", "struct": {"vmsd_name": "s390_css_img", ... }, "size": 768},
     {"name": "css", "type": "nullptr", "size": 1},
     ...,
]}

Now each type-discontiguous region will become a new JSON entry. The
reader should interpret this as a concatenation of values, all part of
the same field.

Parsing the JSON with analyze-script.py now shows the proper data
being pointed to at the places where the pointer is valid and
"nullptr" where there's NULL:

"s390_css (14)": {
    ...
    "css": [
        "nullptr",
        "nullptr",
        ...
        "nullptr",
        {
            "chpids": [
            {
                "in_use": "0x00",
                "type": "0x00",
                "is_virtual": "0x00"
            },
            ...
            ]
        },
        "nullptr",
    }

Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20250109185249.23952-7-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit 35049eb0d2fc72bb8c563196ec75b4d6c13fce02)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-13 11:25:57 +03:00
Peter Xu 46f2af3e39 migration: Dump correct JSON format for nullptr replacement
QEMU plays a trick with null pointers inside an array of pointers in a VMSD
field.  See 07d4e69147 ("migration/vmstate: fix array of ptr with
nullptrs") for more details on why.  The idea makes sense in general, but
it may overlooked the JSON writer where it could write nothing in a
"struct" in the JSON hints section.

We hit some analyze-migration.py issues on s390 recently, showing that some
of the struct field contains nothing, like:

{"name": "css", "array_len": 256, "type": "struct", "struct": {}, "size": 1}

As described in details by Fabiano:

https://lore.kernel.org/r/87pll37cin.fsf@suse.de

It could be that we hit some null pointers there, and JSON was gone when
they're null pointers.

To fix it, instead of hacking around only at VMStateInfo level, do that
from VMStateField level, so that JSON writer can also be involved.  In this
case, JSON writer will replace the pointer array (which used to be a
"struct") to be the real representation of the nullptr field.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20250109185249.23952-6-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit 9867c3a7ced12dd7519155c047eb2c0098a11c5f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-13 11:25:57 +03:00
Fabiano Rosas 3ba6e1164a migration: Rename vmstate_info_nullptr
Rename vmstate_info_nullptr from "uint64_t" to "nullptr". This vmstate
actually reads and writes just a byte, so the proper name would be
uint8. However, since this is a marker for a NULL pointer, it's
convenient to have a more explicit name that can be identified by the
consumers of the JSON part of the stream.

Change the name to "nullptr" and add support for it in the
analyze-migration.py script. Arbitrarily use the name of the type as
the value of the field to avoid the script showing 0x30 or '0', which
could be confusing for readers.

Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20250109185249.23952-5-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit f52965bf0eeee28e89933264f1a9dbdcdaa76a7e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-13 11:25:57 +03:00
Fabiano Rosas e7a9d93428 s390x: Fix CSS migration
Commit a55ae46683 ("s390: move css_migration_enabled from machine to
css.c") disabled CSS migration globally instead of doing it
per-instance.

CC: Paolo Bonzini <pbonzini@redhat.com>
CC: qemu-stable@nongnu.org #9.1
Fixes: a55ae46683 ("s390: move css_migration_enabled from machine to css.c")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2704
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20250109185249.23952-8-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit c76ee1f6255c3988a9447d363bb17072f1ec84e1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-13 11:25:57 +03:00
Fabiano Rosas e3839b0c19 migration: Fix parsing of s390 stream
The parsing for the S390StorageAttributes section is currently leaving
an unconsumed token that is later interpreted by the generic code as
QEMU_VM_EOF, cutting the parsing short.

The migration will issue a STATTR_FLAG_DONE between iterations, which
the script consumes correctly, but there's a final STATTR_FLAG_EOS at
.save_complete that the script is ignoring. Since the EOS flag is a
u64 0x1ULL and the stream is big endian, on little endian hosts a byte
read from it will be 0x0, the same as QEMU_VM_EOF.

Fixes: 81c2c9dd5d ("tests/qtest/migration-test: Fix analyze-migration.py for s390x")
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20250109185249.23952-4-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit 69d1f784569fdb950f2923c3b6d00d7c1b71acc1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-13 11:25:57 +03:00
Fabiano Rosas abb738ad33 migration: Remove unused argument in vmsd_desc_field_end
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20250109185249.23952-3-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit 2aead53d39b828f8d9d0769ffa3579dadd64d846)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-13 11:25:57 +03:00
Fabiano Rosas ea3b821595 migration: Add more error handling to analyze-migration.py
The analyze-migration script was seen failing in s390x in misterious
ways. It seems we're reaching the VMSDFieldStruct constructor without
any fields, which would indicate an empty .subsection entry, a
VMSTATE_STRUCT with no fields or a vmsd with no fields. We don't have
any of those, at least not without the unmigratable flag set, so this
should never happen.

Add some debug statements so that we can see what's going on the next
time the issue happens.

Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20250109185249.23952-2-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit 86bee9e0c761a3d0e67c43b44001fd752f894cb0)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-13 09:44:33 +03:00
Fabiano Rosas 7e4480dde2 migration/multifd: Fix compat with QEMU < 9.0
Commit f5f48a7891 ("migration/multifd: Separate SYNC request with
normal jobs") changed the multifd source side to stop sending data
along with the MULTIFD_FLAG_SYNC, effectively introducing the concept
of a SYNC-only packet. Relying on that, commit d7e58f412c
("migration/multifd: Don't send ram data during SYNC") later came
along and skipped reading data from SYNC packets.

In a versions timeline like this:

  8.2 f5f48a7 9.0 9.1 d7e58f41 9.2

The issue arises that QEMUs < 9.0 still send data along with SYNC, but
QEMUs > 9.1 don't gather that data anymore. This leads to various
kinds of migration failures due to desync/missing data.

Stop checking for a SYNC packet on the destination and unconditionally
unfill the packet.

>From now on:

old -> new:
the source sends data + sync, destination reads normally

new -> new:
source sends only sync, destination reads zeros

new -> old:
source sends only sync, destination reads zeros

CC: qemu-stable@nongnu.org
Fixes: d7e58f412c ("migration/multifd: Don't send ram data during SYNC")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2720
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-Id: <20241213160120.23880-2-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit b93d897ea2f0abbe7fc341a9ac176b5ecd0f3c93)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-01-12 15:54:21 +03:00