Commit graph

266 commits

Author SHA1 Message Date
Mark Cave-Ayland
7c773b4267 include/hw/vfio/vfio-device.h: fix include header guard name
The header guard was incorrectly called HW_VFIO_VFIO_COMMON_H instead of
HW_VFIO_VFIO_DEVICE_H.

Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-29-mark.caveayland@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-09-25 17:55:20 +02:00
Mark Cave-Ayland
ef70eb32b8 include/hw/vfio/vfio-container-base.h: rename file to vfio-container.h
With the rename of VFIOContainerBase to VFIOContainer, the vfio-container-base.h
header file containing the struct definition is misleading. Rename it from
vfio-container-base.h to vfio-container.h accordingly, fixing up the name
of the include guard at the same time.

Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-5-mark.caveayland@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-09-25 17:55:19 +02:00
Mark Cave-Ayland
07cbbfb108 include/hw/vfio/vfio-container.h: rename file to vfio-container-legacy.h
With the rename of VFIOContainer to VFIOLegacyContainer, the vfio-container.h
header file containing the struct definition is misleading. Rename it from
vfio-container.h to vfio-container-legacy.h accordingly, fixing up the name
of the include guard at the same time.

Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-4-mark.caveayland@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-09-25 17:55:19 +02:00
Mark Cave-Ayland
e2e269d580 include/hw/vfio/vfio-container-base.h: rename VFIOContainerBase to VFIOContainer
Now that the VFIOContainer struct name is available, rename VFIOContainerBase
to VFIOContainer to better indicate that it is the superclass of other
VFIOFooContainer structs.

Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-3-mark.caveayland@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-09-25 17:55:19 +02:00
Mark Cave-Ayland
da9211f28e include/hw/vfio/vfio-container.h: rename VFIOContainer to VFIOLegacyContainer
The VFIOContainer struct represents the legacy VFIO container even though the
name suggests it may be the common superclass of all VFIO containers. Rename it
to VFIOLegacyContainer to make this clearer, which is also a better match for
its VFIO_IOMMU_LEGACY QOM type name.

Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-2-mark.caveayland@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-09-25 17:55:19 +02:00
Mark Cave-Ayland
507a118e9f vfio/vfio-container.h: rename VFIOContainer bcontainer field to parent_obj
Now that nothing accesses the bcontainer field directly, rename bcontainer to
parent_obj as per our current coding guidelines.

Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250715093110.107317-8-mark.caveayland@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-09-08 16:46:31 +02:00
Mark Cave-Ayland
98c12de5ae vfio/vfio-container.h: update VFIOContainer declaration
Update the VFIOContainer declaration so that it is closer to our coding
guidelines: emove the explicit typedef (this is already handled by the
OBJECT_DECLARE_TYPE() macro) and add a blank line after the parent object.

Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250715093110.107317-3-mark.caveayland@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-09-08 16:46:31 +02:00
Mark Cave-Ayland
42875d256d vfio/vfio-container-base.h: update VFIOContainerBase declaration
Update the VFIOContainerBase declaration to match our current coding
guidelines: remove the explicit typedef (this is already handled by the
OBJECT_DECLARE_TYPE() macro), add a blank line after the parent object,
rename parent to parent_obj, and move the macro declaration next to the
VFIOContainerBase struct declaration.

Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250715093110.107317-2-mark.caveayland@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-09-08 16:46:31 +02:00
Cédric Le Goater
e7a47f7177 vfio: Move vfio-region.h under hw/vfio/
Since the removal of vfio-platform, header file vfio-region.h no
longer needs to be a public VFIO interface. Move it under hw/vfio.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250901064631.530723-9-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-09-08 16:46:31 +02:00
Cédric Le Goater
762c855439 vfio: Remove 'vfio-platform'
The VFIO_PLATFORM device type has been deprecated in the QEMU 10.0
timeframe. All dependent devices have been removed. Now remove the
core vfio platform framework.

Rename VFIO_DEVICE_TYPE_PLATFORM enum to VFIO_DEVICE_TYPE_UNUSED to
maintain the same index for the CCW and AP VFIO device types.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250901064631.530723-8-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-09-08 16:46:31 +02:00
Cédric Le Goater
8ebc416ac1 vfio: Remove 'vfio-calxeda-xgmac' device
The VFIO_XGMAC device type has been deprecated in the QEMU 10.0
timeframe. Remove it.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250901064631.530723-7-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-09-08 16:46:31 +02:00
Cédric Le Goater
aeb1a50d4a vfio: Remove 'vfio-amd-xgbe' device
The VFIO_AMD_XGBE device type has been deprecated in the QEMU 10.0
timeframe. The AMD "Seattle" device is not supported anymore. Remove it.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250901064631.530723-6-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-09-08 16:46:31 +02:00
Steve Sistare
322ee16824 vfio/pci: preserve pending interrupts
cpr-transfer may lose a VFIO interrupt because the KVM instance is
destroyed and recreated.  If an interrupt arrives in the middle, it is
dropped.  To fix, stop pending new interrupts during cpr save, and pick
up the pieces.  In more detail:

Stop the VCPUs. Call kvm_irqchip_remove_irqfd_notifier_gsi --> KVM_IRQFD to
deassign the irqfd gsi that routes interrupts directly to the VCPU and KVM.
After this call, interrupts fall back to the kernel vfio_msihandler, which
writes to QEMU's kvm_interrupt eventfd.  CPR already preserves that
eventfd.  When the route is re-established in new QEMU, the kernel tests
the eventfd and injects an interrupt to KVM if necessary.

Deassign INTx in a similar manner.  For both MSI and INTx, remove the
eventfd handler so old QEMU does not consume an event.

If an interrupt was already pended to KVM prior to the completion of
kvm_irqchip_remove_irqfd_notifier_gsi, it will be recovered by the
subsequent call to cpu_synchronize_all_states, which pulls KVM interrupt
state to userland prior to saving it in vmstate.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1752689169-233452-3-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-08-09 00:06:48 +02:00
Maciej S. Szmigiero
300dcf58b7 vfio/migration: Max in-flight VFIO device state buffers size limit
Allow capping the maximum total size of in-flight VFIO device state buffers
queued at the destination, otherwise a malicious QEMU source could
theoretically cause the target QEMU to allocate unlimited amounts of memory
for buffers-in-flight.

Since this is not expected to be a realistic threat in most of VFIO live
migration use cases and the right value depends on the particular setup
disable this limit by default by setting it to UINT64_MAX.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/qemu-devel/4f7cad490988288f58e36b162d7a888ed7e7fd17.1752589295.git.maciej.szmigiero@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-07-15 17:11:12 +02:00
Maciej S. Szmigiero
6380b0a02f vfio/migration: Add x-migration-load-config-after-iter VFIO property
This property allows configuring whether to start the config load only
after all iterables were loaded, during non-iterables loading phase.
Such interlocking is required for ARM64 due to this platform VFIO
dependency on interrupt controller being loaded first.

The property defaults to AUTO, which means ON for ARM, OFF for other
platforms.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/qemu-devel/0e03c60dbc91f9a9ba2516929574df605b7dfcb4.1752589295.git.maciej.szmigiero@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-07-15 17:11:12 +02:00
Steve Sistare
99cedd5d55 vfio/container: delete old cpr register
vfio_cpr_[un]register_container is no longer used since they were
subsumed by container type-specific registration.  Delete them.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-21-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-07-03 13:42:28 +02:00
Steve Sistare
f2f3e4667e vfio/iommufd: cpr state
VFIO iommufd devices will need access to ioas_id, devid, and hwpt_id in
new QEMU at realize time, so add them to CPR state.  Define CprVFIODevice
as the object which holds the state and is serialized to the vmstate file.
Define accessors to copy state between VFIODevice and CprVFIODevice.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-15-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-07-03 13:42:28 +02:00
Steve Sistare
a6f2f9c42f migration: vfio cpr state hook
Define a list of vfio devices in CPR state, in a subsection so that
older QEMU can be live updated to this version.  However, new QEMU
will not be live updateable to old QEMU.  This is acceptable because
CPR is not yet commonly used, and updates to older versions are unusual.

The contents of each device object will be defined by the vfio subsystem
in a subsequent patch.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-14-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-07-03 13:42:28 +02:00
Steve Sistare
06c6a65852 vfio/iommufd: register container for cpr
Register a vfio iommufd container and device for CPR, replacing the generic
CPR register call with a more specific iommufd register call.  Add a
blocker if the kernel does not support IOMMU_IOAS_CHANGE_PROCESS.

This is mostly boiler plate.  The fields to to saved and restored are added
in subsequent patches.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-13-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-07-03 13:42:28 +02:00
Steve Sistare
a434fd8f64 vfio/iommufd: device name blocker
If an invariant device name cannot be created, block CPR.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-12-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-07-03 13:42:28 +02:00
Steve Sistare
184053f04f vfio/iommufd: add vfio_device_free_name
Define vfio_device_free_name to free the name created by
vfio_device_get_name.  A subsequent patch will do more there.
No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-11-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-07-03 13:42:28 +02:00
Steve Sistare
fb32965b6d vfio/iommufd: use IOMMU_IOAS_MAP_FILE
Use IOMMU_IOAS_MAP_FILE when the mapped region is backed by a file.
Such a mapping can be preserved without modification during CPR,
because it depends on the file's address space, which does not change,
rather than on the process's address space, which does change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-9-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-07-03 13:42:28 +02:00
Steve Sistare
7ed0919119 migration: close kvm after cpr
cpr-transfer breaks vfio network connectivity to and from the guest, and
the host system log shows:
  irq bypass consumer (token 00000000a03c32e5) registration fails: -16
which is EBUSY.  This occurs because KVM descriptors are still open in
the old QEMU process.  Close them.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-4-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-07-03 13:42:28 +02:00
Steve Sistare
30edcb4d4e vfio-pci: preserve MSI
Save the MSI message area as part of vfio-pci vmstate, and preserve the
interrupt and notifier eventfd's.  migrate_incoming loads the MSI data,
then the vfio-pci post_load handler finds the eventfds in CPR state,
rebuilds vector data structures, and attaches the interrupts to the new
KVM instance.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-2-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-07-03 13:42:28 +02:00
Zhenzhong Duan
924c3ccb31 vfio/container: Fix vfio_container_post_load()
When there are multiple VFIO containers, vioc->dma_map is restored
multiple times, this made only first container work and remaining
containers using vioc->dma_map restored by first container.

Fix it by save and restore vioc->dma_map locally. saved_dma_map in
VFIOContainerCPR becomes useless and is removed.

Fixes: 7e9f214113 ("vfio/container: restore DMA vaddr")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/qemu-devel/20250627063332.5173-3-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-07-03 13:42:28 +02:00
John Levon
438d863f1f vfio-user: connect vfio proxy to remote server
Introduce the vfio-user "proxy": this is the client code responsible for
sending and receiving vfio-user messages across the control socket.

The new files hw/vfio-user/proxy.[ch] contain some basic plumbing for
managing the proxy; initialize the proxy during realization of the
VFIOUserPCIDevice instance.

Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250625193012.2316242-3-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-26 08:55:38 +02:00
John Levon
9fca2b7d70 vfio-user: add vfio-user class and container
Introduce basic plumbing for vfio-user with CONFIG_VFIO_USER.

We introduce VFIOUserContainer in hw/vfio-user/container.c, which is a
container type for the "IOMMU" type "vfio-iommu-user", and share some
common container code from hw/vfio/container.c.

Add hw/vfio-user/pci.c for instantiating VFIOUserPCIDevice objects,
sharing some common code from hw/vfio/pci.c.

Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250625193012.2316242-2-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-26 08:55:38 +02:00
John Levon
8d60d069d7 vfio: add documentation for posted write argument
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250616101314.3189793-1-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-26 08:55:37 +02:00
John Levon
b1f521de8b vfio: add vfio_device_get_region_fd()
This keeps the existence of ->region_fds private to hw/vfio/device.c.

Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250616101337.3190027-1-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-26 08:55:37 +02:00
John Levon
079e7216de vfio: improve VFIODeviceIOOps docs
Explicitly describe every parameter rather than summarizing.

Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250611104753.1199796-1-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-11 14:01:58 +02:00
Steve Sistare
031fbb7110 vfio-pci: skip reset during cpr
Do not reset a vfio-pci device during CPR, and do not complain if the
kernel's PCI config space changes for non-emulated bits between the
vmstate save and load, which can happen due to ongoing interrupt activity.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1749569991-25171-12-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-11 14:01:58 +02:00
Steve Sistare
eba1f657cb vfio/container: recover from unmap-all-vaddr failure
If there are multiple containers and unmap-all fails for some container, we
need to remap vaddr for the other containers for which unmap-all succeeded.
Recover by walking all address ranges of all containers to restore the vaddr
for each.  Do so by invoking the vfio listener callback, and passing a new
"remap" flag that tells it to restore a mapping without re-allocating new
userland data structures.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1749569991-25171-9-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-11 14:01:58 +02:00
Steve Sistare
dac0dd68d9 vfio/container: mdev cpr blocker
During CPR, after VFIO_DMA_UNMAP_FLAG_VADDR, the vaddr is temporarily
invalid, so mediated devices cannot be supported.  Add a blocker for them.
This restriction will not apply to iommufd containers when CPR is added
for them in a future patch.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1749569991-25171-8-git-send-email-steven.sistare@oracle.com
[ clg: Fixed context change in VFIODevice ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-11 14:01:58 +02:00
Steve Sistare
7e9f214113 vfio/container: restore DMA vaddr
In new QEMU, do not register the memory listener at device creation time.
Register it later, in the container post_load handler, after all vmstate
that may affect regions and mapping boundaries has been loaded.  The
post_load registration will cause the listener to invoke its callback on
each flat section, and the calls will match the mappings remembered by the
kernel.

The listener calls a special dma_map handler that passes the new VA of each
section to the kernel using VFIO_DMA_MAP_FLAG_VADDR.  Restore the normal
handler at the end.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1749569991-25171-7-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-11 14:01:58 +02:00
Steve Sistare
c29a65ed68 vfio/container: preserve descriptors
At vfio creation time, save the value of vfio container, group, and device
descriptors in CPR state.  On qemu restart, vfio_realize() finds and uses
the saved descriptors.

During reuse, device and iommu state is already configured, so operations
in vfio_realize that would modify the configuration, such as vfio ioctl's,
are skipped.  The result is that vfio_realize constructs qemu data
structures that reflect the current state of the device.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1749569991-25171-5-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-11 14:01:58 +02:00
Steve Sistare
54857b0816 vfio/container: register container for cpr
Register a legacy container for cpr-transfer, replacing the generic CPR
register call with a more specific legacy container register call.  Add a
blocker if the kernel does not support VFIO_UPDATE_VADDR or VFIO_UNMAP_ALL.

This is mostly boiler plate.  The fields to to saved and restored are added
in subsequent patches.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/1749569991-25171-4-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-11 14:01:58 +02:00
John Levon
a574b06144 vfio: mark posted writes in region write callbacks
For vfio-user, the region write implementation needs to know if the
write is posted; add the necessary plumbing to support this.

Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250607001056.335310-5-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-11 14:01:58 +02:00
John Levon
59adfc6f18 vfio: add per-region fd support
For vfio-user, each region has its own fd rather than sharing
vbasedev's. Add the necessary plumbing to support this, and use the
correct fd in vfio_region_mmap().

Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250607001056.335310-4-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-11 14:01:58 +02:00
Steve Sistare
3ed34463a2 vfio: move vfio-cpr.h
Move vfio-cpr.h to include/hw/vfio, because it will need to be included by
other files there.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1748546679-154091-9-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-05 10:40:38 +02:00
Steve Sistare
2372f8d94a vfio: vfio_find_ram_discard_listener
Define vfio_find_ram_discard_listener as a subroutine so additional calls to
it may be added in a subsequent patch.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1748546679-154091-8-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-05 10:40:38 +02:00
John Levon
44d0acf834 vfio/container: pass MemoryRegion to DMA operations
Pass through the MemoryRegion to DMA operation handlers of vfio
containers. The vfio-user container will need this later, to translate
the vaddr into an offset for the dma map vfio-user message; CPR will
also will need this.

Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/qemu-devel/20250521215534.2688540-1-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-05 10:40:38 +02:00
John Levon
493a06a2ed vfio: add more VFIOIOMMUClass docs
Add some additional doc comments for these class methods.

Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250520162530.2194548-1-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-05 10:40:38 +02:00
John Levon
d9b7d8b699 vfio/container: pass listener_begin/commit callbacks
The vfio-user container will later need to hook into these callbacks;
set up vfio to use them, and optionally pass them through to the
container.

Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Link: https://lore.kernel.org/qemu-devel/20250507152020.1254632-15-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-05-09 12:42:28 +02:00
John Levon
776066ac90 vfio: add read/write to device IO ops vector
Now we have the region info cache, add ->region_read/write device I/O
operations instead of explicit pread()/pwrite() system calls.

Signed-off-by: John Levon <john.levon@nutanix.com>
Link: https://lore.kernel.org/qemu-devel/20250507152020.1254632-13-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-05-09 12:42:28 +02:00
John Levon
95cdb02451 vfio: add region info cache
Instead of requesting region information on demand with
VFIO_DEVICE_GET_REGION_INFO, maintain a cache: this will become
necessary for performance for vfio-user, where this call becomes a
message over the control socket, so is of higher overhead than the
traditional path.

We will also need it to generalize region accesses, as that means we
can't use ->config_offset for configuration space accesses, but must
look up the region offset (if relevant) each time.

Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250507152020.1254632-12-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-05-09 12:42:28 +02:00
John Levon
38bf025d0d vfio: add device IO ops vector
For vfio-user, device operations such as IRQ handling and region
read/writes are implemented in userspace over the control socket, not
ioctl() to the vfio kernel driver; add an ops vector to generalize this,
and implement vfio_device_io_ops_ioctl for interacting with the kernel
vfio driver.

Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250507152020.1254632-11-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-05-09 12:42:28 +02:00
John Levon
5a22b50591 vfio: add unmap_all flag to DMA unmap callback
We'll use this parameter shortly; this just adds the plumbing.

Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250507152020.1254632-9-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-05-09 12:42:28 +02:00
John Levon
5363a1a117 vfio: add strread/writeerror()
Add simple helpers to correctly report failures from read/write routines
using the return -errno style.

Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250507152020.1254632-7-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-05-09 12:42:28 +02:00
John Levon
5321e623eb vfio: add vfio_device_get_irq_info() helper
Add a helper similar to vfio_device_get_region_info() and use it
everywhere.

Replace a couple of needless allocations with stack variables.

As a side-effect, this fixes a minor error reporting issue in the call
from vfio_msix_early_setup().

Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Link: https://lore.kernel.org/qemu-devel/20250507152020.1254632-5-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-05-09 12:42:28 +02:00
John Levon
ef73671f0b vfio: add vfio_attach_device_by_iommu_type()
Allow attachment by explicitly passing a TYPE_VFIO_IOMMU_* string;
vfio-user will use this later.

Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Link: https://lore.kernel.org/qemu-devel/20250507152020.1254632-4-john.levon@nutanix.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-05-09 12:42:28 +02:00