Add support for XIVE ESB Interrupt Escalation.
Suggested-by: Michael Kowal <kowal@linux.ibm.com>
[This change was taken from a patch provided by Michael Kowal.]
Signed-off-by: Glenn Miles <milesg@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-18-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
This improves the implementation of pulling pool and phys contexts in
XIVE1, by following closer the OS pulling code.
In particular, the old ring data is returned rather than the modified,
and irq signals are reset on pull.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-17-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Rather than functions to return masks to test NSR bits, have functions
to test those bits directly. This should be no functional change, it
just makes the code more readable.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-16-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Have xive_tctx_accept clear NSR in one shot rather than masking out bits
as they are tested, which makes it clear it's reset to 0, and does not
have a partial NSR value in the register.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-15-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
If CPPR is lowered to preclude the pending interrupt, NSR should be
cleared and the qemu_irq should be lowered. This avoids some cases
of supurious interrupts.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-14-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The group interrupt delivery flow selects the group backlog scan if
LSMFB < IPB, but that scan may find an interrupt with a priority >=
IPB. In that case, the VP-direct interrupt should be chosen. This
extends to selecting the lowest prio between POOL and PHYS rings.
Implement this just by re-starting the selection logic if the
backlog irq was not found or priority did not match LSMFB (LSMFB
is updated so next time around it would see the right value and
not loop infinitely).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-13-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Group interrupts should not be taken from the backlog and presented
if they are precluded by CPPR.
Fixes: 855434b3b8 ("ppc/xive2: Process group backlog when pushing an OS context")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-12-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
According to the XIVE spec, updating the CPPR should also update the
PIPR. The final value of the PIPR depends on other factors, but it
should never be set to a value that is above the CPPR.
Also added support for redistributing an active group interrupt when it
is precluded as a result of changing the CPPR value.
Signed-off-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-11-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
A problem was seen where uart interrupts would be lost resulting in the
console hanging. Traces showed that a lower priority interrupt was
preempting a higher priority interrupt, which would result in the higher
priority interrupt never being handled.
The new interrupt's priority was being compared against the CPPR
(Current Processor Priority Register) instead of the PIPR (Post
Interrupt Priority Register), as was required by the XIVE spec.
This allowed for a window between raising an interrupt and ACK'ing
the interrupt where a lower priority interrupt could slip in.
Fixes: 26c55b9941 ("ppc/xive2: Process group backlog when updating the CPPR")
Signed-off-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-10-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The current xive algorithm for finding a matching group vCPU
target always uses the first vCPU found. And, since it always
starts the search with thread 0 of a core, thread 0 is almost
always used to handle group interrupts. This can lead to additional
interrupt latency and poor performance for interrupt intensive
work loads.
Changing this to use a simple round-robin algorithm for deciding which
thread number to use when starting a search, which leads to a more
distributed use of threads for handling group interrupts.
[npiggin: Also round-robin among threads, not just cores]
Signed-off-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-9-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
When the END Event Queue wraps the END EQ Generation bit is flipped and the
Generation Flipped bit is set to one. On a END cache Watch read operation,
the Generation Flipped bit needs to be reset.
While debugging an error modified END not valid error messages to include
the method since all were the same.
Signed-off-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-8-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Test that the NSR exception bit field is equal to the pool ring value,
rather than any common bits set, which is more correct (although there
is no practical bug because the LSI NSR type is not implemented and
POOL/PHYS NSR are encoded with exclusive bits).
Fixes: 4c3ccac636 ("pnv/xive: Add special handling for pool targets")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-7-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Pushing a context and loading IPB from NVP is defined to merge ('or')
that IPB into the TIMA IPB register. PIPR should therefore be calculated
based on the final IPB value, not just the NVP value.
Fixes: 9d2b6058c5 ("ppc/xive2: Add grouping level to notification")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-6-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
In a multi chip environment there will be remote/forwarded VSDs. The check
to find a matching INT controller (XIVE) of the remote block number was
checking the INTs chip number. Block numbers are not tied to a chip number.
The matching remote INT is the one that matches the forwarded VSD address
with VSD types associated MMIO BAR.
Signed-off-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-5-npiggin@gmail.com
[ clg: Fixed log format in pnv_xive2_get_remote() ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
The queue size of an Event Notification Descriptor (END)
is determined by the 'cl' and QsZ fields of the END.
If the cl field is 1, then the queue size (in bytes) will
be the size of a cache line 128B * 2^QsZ and QsZ is limited
to 4. Otherwise, it will be 4096B * 2^QsZ with QsZ limited
to 12.
Fixes: f8a233dedf ("ppc/xive2: Introduce a XIVE2 core framework")
Signed-off-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-4-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
We use the local variable 'buf' only when we call dma_memory_read(),
and it is always set to &tx_send_buffer[prev_buf_size] immediately
before both of those calls. So remove the variable and pass
tx_send_buffer + prev_buf_size to dma_memory_read().
This fixes in passing a place where we set buf = tx_send_buffer
but never used that value because we always updated buf to
something else later before using it.
Coverity: CID 1534027
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
In gmac_try_send_next_packet() we have code that does "if this block
of data won't fit in the buffer, reallocate it". However, the
condition it uses is
if ((prev_buf_size + tx_buf_len) > sizeof(buf))
where buf is a uint8_t *.
This means that sizeof(buf) is always 8 bytes, and the condition will
almost always be true, so we will reallocate the buffer more often
than we need to.
Correct the condition to test against tx_buffer_size, which is
where we track how big the allocated buffer is.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
After the bug fix in the previous commit, the length and prev_buf_size
variables are identical, except that prev_buf_size is uint32_t and
length is uint16_t. We can therefore unify them. The only place where
the type makes a difference is that we will truncate the packet
at 64K when sending it; this commit preserves that behaviour
by using a local variable when doing the packet send.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The transmit loop in gmac_try_send_next_packet() is constructed in a
way that means it will send incorrect data if it it sends more than
one packet.
The function assembles the outbound data in a dynamically allocated
block of memory which is pointed to by tx_send_buffer. We track the
first point in this block of memory which is not yet used with the
prev_buf_size offset, initially zero. We track the size of the
packet we're sending with the length variable, also initially zero.
As we read chunks of data out of guest memory, we write them to
tx_send_buffer[prev_buf_size], and then increment both prev_buf_size
and length. (We might dynamically reallocate the buffer if needed.)
When we send a packet, we checksum and send length bytes, starting at
tx_send_buffer, and then we reset length to 0. This gives the right
data for the first packet. But we don't reset prev_buf_size. This
means that if we process more descriptors with further data for the
next packet, that data will continue to accumulate at offset
prev_buf_size, i.e. after the data for the first packet. But when
we transmit that second packet, we send length bytes from
tx_send_buffer, so we will send a packet which has the length of the
second packet but the data of the first one.
The fix for this is to also clear prev_buf_size after the packet has
been sent -- we never need the data from packet one after we've sent
it, so we can write packet two's data starting at the beginning of
the buffer.
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
ramfb is a sysbus device so it can only used for machine types where it
is explicitly enabled:
# git grep machine_class_allow_dynamic_sysbus_dev.*TYPE_RAMFB_DEVICE
hw/arm/virt.c: machine_class_allow_dynamic_sysbus_dev(mc,
TYPE_RAMFB_DEVICE);
hw/i386/microvm.c: machine_class_allow_dynamic_sysbus_dev(mc,
TYPE_RAMFB_DEVICE);
hw/i386/pc_piix.c: machine_class_allow_dynamic_sysbus_dev(m,
TYPE_RAMFB_DEVICE);
hw/i386/pc_q35.c: machine_class_allow_dynamic_sysbus_dev(m,
TYPE_RAMFB_DEVICE);
hw/loongarch/virt.c: machine_class_allow_dynamic_sysbus_dev(mc,
TYPE_RAMFB_DEVICE);
hw/riscv/virt.c: machine_class_allow_dynamic_sysbus_dev(mc,
TYPE_RAMFB_DEVICE);
So these six are the only machine types we have to worry about.
The three x86 machine types (pc, q35, microvm) will actually use the rom
(when booting with seabios).
For arm/riscv/loongarch virt we want to disable the rom.
This patch sets ramfb romfile option to false by default, except for x86
machines types (pc, q35, microvm) which need the rom file when booting
with seabios and machine types <= 10.0 (handling the case of arm virt,
for compat reasons).
At the same time, set the "use-legacy-x86-rom" property to true on those
historical versioned machine types in order to avoid the memory layout
being changed.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shaoqin Huang <shahuang@redhat.com>
Message-ID: <20250717100941.2230408-4-shahuang@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Move the TYPE_* to a new file hw/vfio/types.h because the
TYPE_VFIO_PCI will be used in later patch, but directly include the
hw/vfio/pci.h can cause some compilation error when cross build the
windows version.
The hw/vfio/types.h can be included to mitigate that problem.
Signed-off-by: Shaoqin Huang <shahuang@redhat.com>
Message-ID: <20250717100941.2230408-3-shahuang@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Currently the ramfb device loads the vgabios-ramfb.bin unconditionally,
but only the x86 need the vgabios-ramfb.bin, this can cause that when
use the release package on arm64 it can't find the vgabios-ramfb.bin.
Because only seabios will use the vgabios-ramfb.bin, load the rom logic
is x86-specific. For other !x86 platforms, the edk2 ships an EFI driver
for ramfb, so they don't need to load the romfile.
So add a new property use-legacy-x86-rom in both ramfb and vfio_pci
device, because the vfio display also use the ramfb_setup() to load
the vgabios-ramfb.bin file.
After have this property, the machine type can set the compatibility to
not load the vgabios-ramfb.bin if the arch doesn't need it.
For now the default value is true but it will be turned off by default
in subsequent patch when compats get properly handled.
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shaoqin Huang <shahuang@redhat.com>
Message-ID: <20250717100941.2230408-2-shahuang@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
v9fs_path_sprintf() is annotated with G_GNUC_PRINTF(2, 3) in
hw/9pfs/9p.c, but the prototype in hw/9pfs/9p.h is missing the
attribute, so callers that include only the header do not get format
checking.
Move the annotation to the header and delete the duplicate in the
source file. No behavior change.
Signed-off-by: Sean Wei <me@sean.taipei>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250613.qemu.9p.02@sean.taipei>
[CS: fix code style (max. 80 chars per line)]
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
SPCR acpi table can now be disabled
vhost-vdpa can now report hashing capability to guest
PPTT acpi table now tells guest vCPUs are identical
vost-user-blk now shuts down faster
loongarch64 now supports bios-tables-test
intel_iommu now supports ATS
cxl now supports DCD Fabric Management Command Set
arm now supports acpi pci hotplug
fixes, cleanups
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmh1+7APHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpcZ8H/2udpCZ49vjPB8IwQAGdFTw2TWVdxUQFHexQ
pOsCGyFBNAXqD1bmb8lwWyYVJ08WELyL6xWsQ5tfVPiXpKYYHPHl4rNr/SPoyNcv
joY++tagudmOki2DU7nfJ+rPIIuigOTUHbv4TZciwcHle6f65s0iKXhR1sL0cj4i
TS6iJlApSuJInrBBUxuxSUomXk79mFTNKRiXj1k58LRw6JOUEgYvtIW8i+mOUcTg
h1dZphxEQr/oG+a2pM8GOVJ1AFaBPSfgEnRM4kTX9QuTIDCeMAKUBo/mwOk6PV7z
ZhSrDPLrea27XKGL++EJm0fFJ/AsHF1dTks2+c0rDrSK+UV87Zc=
=sktm
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pci,pc: features, fixes, tests
SPCR acpi table can now be disabled
vhost-vdpa can now report hashing capability to guest
PPTT acpi table now tells guest vCPUs are identical
vost-user-blk now shuts down faster
loongarch64 now supports bios-tables-test
intel_iommu now supports ATS
cxl now supports DCD Fabric Management Command Set
arm now supports acpi pci hotplug
fixes, cleanups
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmh1+7APHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpcZ8H/2udpCZ49vjPB8IwQAGdFTw2TWVdxUQFHexQ
# pOsCGyFBNAXqD1bmb8lwWyYVJ08WELyL6xWsQ5tfVPiXpKYYHPHl4rNr/SPoyNcv
# joY++tagudmOki2DU7nfJ+rPIIuigOTUHbv4TZciwcHle6f65s0iKXhR1sL0cj4i
# TS6iJlApSuJInrBBUxuxSUomXk79mFTNKRiXj1k58LRw6JOUEgYvtIW8i+mOUcTg
# h1dZphxEQr/oG+a2pM8GOVJ1AFaBPSfgEnRM4kTX9QuTIDCeMAKUBo/mwOk6PV7z
# ZhSrDPLrea27XKGL++EJm0fFJ/AsHF1dTks2+c0rDrSK+UV87Zc=
# =sktm
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 15 Jul 2025 02:56:48 EDT
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (97 commits)
hw/cxl: mailbox-utils: 0x5605 - FMAPI Initiate DC Release
hw/cxl: mailbox-utils: 0x5604 - FMAPI Initiate DC Add
hw/cxl: Create helper function to create DC Event Records from extents
hw/cxl: mailbox-utils: 0x5603 - FMAPI Get DC Region Extent Lists
hw/cxl: mailbox-utils: 0x5602 - FMAPI Set DC Region Config
hw/mem: cxl_type3: Add DC Region bitmap lock
hw/cxl: Move definition for dynamic_capacity_uuid and enum for DC event types to header
hw/cxl: mailbox-utils: 0x5601 - FMAPI Get Host Region Config
hw/mem: cxl_type3: Add dsmas_flags to CXLDCRegion struct
hw/cxl: mailbox-utils: 0x5600 - FMAPI Get DCD Info
hw/cxl: fix DC extent capacity tracking
tests: virt: Update expected ACPI tables for virt test
hw/acpi/aml-build: Build a root node in the PPTT table
hw/acpi/aml-build: Set identical implementation flag for PPTT processor nodes
tests: virt: Allow changes to PPTT test table
qtest/bios-tables-test: Generate reference blob for DSDT.acpipcihp
qtest/bios-tables-test: Generate reference blob for DSDT.hpoffacpiindex
tests/qtest/bios-tables-test: Add aarch64 ACPI PCI hotplug test
tests/qtest/bios-tables-test: Prepare for addition of acpi pci hp tests
hw/arm/virt: Let virt support pci hotplug/unplug GED event
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Conflicts:
net/vhost-vdpa.c
vhost_vdpa_set_steering_ebpf() was removed, resolve the context
conflict.
Display the CPU model in 'info cpus'. Example before:
$ qemu-system-aarch64 -M xlnx-versal-virt -S -monitor stdio
QEMU 10.0.0 monitor - type 'help' for more information
(qemu) info cpus
* CPU #0: thread_id=42924
CPU #1: thread_id=42924
CPU #2: thread_id=42924
CPU #3: thread_id=42924
(qemu) q
and after:
$ qemu-system-aarch64 -M xlnx-versal-virt -S -monitor stdio
QEMU 10.0.50 monitor - type 'help' for more information
(qemu) info cpus
* CPU #0: thread_id=42916 model=cortex-a72
CPU #1: thread_id=42916 model=cortex-a72
CPU #2: thread_id=42916 model=cortex-r5f
CPU #3: thread_id=42916 model=cortex-r5f
(qemu)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20250715090624.52377-3-philmd@linaro.org>
Knowing the QOM type name of a CPU can be useful,
in particular to infer its model name.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20250715090624.52377-2-philmd@linaro.org>
"hw/xen/arch_hvm.h" only declares the arch_handle_ioreq() and
arch_xen_set_memory() prototypes, which are not used by xen-pvh.c.
Remove the unnecessary header inclusion.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Message-Id: <20250715071528.46196-1-philmd@linaro.org>
Allow capping the maximum total size of in-flight VFIO device state buffers
queued at the destination, otherwise a malicious QEMU source could
theoretically cause the target QEMU to allocate unlimited amounts of memory
for buffers-in-flight.
Since this is not expected to be a realistic threat in most of VFIO live
migration use cases and the right value depends on the particular setup
disable this limit by default by setting it to UINT64_MAX.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/qemu-devel/4f7cad490988288f58e36b162d7a888ed7e7fd17.1752589295.git.maciej.szmigiero@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
This property allows configuring whether to start the config load only
after all iterables were loaded, during non-iterables loading phase.
Such interlocking is required for ARM64 due to this platform VFIO
dependency on interrupt controller being loaded first.
The property defaults to AUTO, which means ON for ARM, OFF for other
platforms.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/qemu-devel/0e03c60dbc91f9a9ba2516929574df605b7dfcb4.1752589295.git.maciej.szmigiero@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Introduce x-pci-class-code option to allow users to override PCI class
code of a device, similar to the existing x-pci-vendor-id option. Only
the lower 24 bits of this option are used, though a uint32 is used here
for determining whether the value is valid and set by user.
Additionally, to ensure VGA ranges are only exposed on VGA devices,
pci_register_vga() is now called in vfio_pci_config_setup(), after
the class code override is completed.
This is mainly intended for IGD devices that expose themselves either
as VGA controller (primary display) or Display controller (non-primary
display). The UEFI GOP driver depends on the device reporting a VGA
controller class code (0x030000).
Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250708145211.6179-1-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Coverity reported:
CID 1611805: Uninitialized variables
in vfio_user_dma_map(). This can occur in the happy path when
->async_ops was not set; as this doesn't typically happen, it wasn't
caught during testing.
Align both map and unmap implementations to initialize ret the same way
to resolve this.
Resolves: Coverity CID 1611805
Fixes: 18e899e6 ("vfio-user: implement VFIO_USER_DMA_MAP/UNMAP")
Reported-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Mark Cave-Ayland <mark.caveayland@nutanix.com>
Link: https://lore.kernel.org/qemu-devel/20250715115954.515819-5-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Coverity reported:
CID 1611806: Concurrent data access violations (BAD_CHECK_OF_WAIT_COND)
A wait is performed without a loop. If there is a spurious wakeup, the
condition may not be satisfied.
Fix this by checking ->state for VFIO_PROXY_CLOSED in a loop.
Also rename the callback for clarity.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Mark Cave-Ayland <markcaveayland@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250715115954.515819-4-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
We were not initializing the region fd array to -1, so we would
accidentally try to close(0) on cleanup for any region that is not
referenced.
Fixes: 95cdb024 ("vfio: add region info cache")
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Mark Cave-Ayland <mark.caveayland@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250715115954.515819-3-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
FM DCD Management command 0x5605 implemented per CXL r3.2 Spec Section 7.6.7.6.6
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Anisa Su <anisa.su@samsung.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250714174509.1984430-12-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
FM DCD Management command 0x5604 implemented per CXL r3.2 Spec Section 7.6.7.6.5
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Anisa Su <anisa.su@samsung.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250714174509.1984430-11-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Prepatory patch for following FMAPI Add/Release Patches. Refactors part
of qmp_cxl_process_dynamic_capacity_prescriptive() into a helper
function to create DC Event Records and insert in the event log.
Moves definition for CXL_NUM_EXTENTS_SUPPORTED to cxl.h so it can be
accessed by cxl-mailbox-utils.c and cxl-events.c, where the helper
function is defined.
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Anisa Su <anisa.su@samsung.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250714174509.1984430-10-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
FM DCD Management command 0x5603 implemented per CXL r3.2 Spec Section 7.6.7.6.4
Very similar to previously implemented command 0x4801.
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Anisa Su <anisa.su@samsung.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250714174509.1984430-9-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
FM DCD Management command 0x5602 implemented per CXL r3.2 Spec Section 7.6.7.6.3
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Anisa Su <anisa.su@samsung.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250714174509.1984430-8-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add a lock on the bitmap of each CXLDCRegion in preparation for the next
patch which implements FMAPI Set DC Region Configuration. This command
can modify the block size, which means the region's bitmap must be updated
accordingly.
The lock becomes necessary when commands that add/release extents
(meaning they update the bitmap too) are enabled on a different CCI than
the CCI on which the FMAPI commands are enabled.
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Anisa Su <anisa.su@samsung.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250714174509.1984430-7-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Move definition/enum to cxl_events.h for shared use in next patch
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Anisa Su <anisa.su@samsung.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250714174509.1984430-6-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
FM DCD Management command 0x5601 implemented per CXL r3.2 Spec Section 7.6.7.6.2
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Anisa Su <anisa.su@samsung.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250714174509.1984430-5-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add booleans to DC Region struct to represent dsmas flags (defined in CDAT) in
preparation for the next command, which returns the flags in the next mailbox
command 0x5601.
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Anisa Su <anisa.su@samsung.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250714174509.1984430-4-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>