* tests, scripts: Don't import print_function from __future__
* Implement FEAT_ATS1A
* Remove deprecated pxa CPU family
* arm/kvm: report registers we failed to set
* Expose SME registers to GDB via gdbstub
* linux-user/aarch64: Generate ESR signal records
* hw/arm/raspi4b: remove redundant check in raspi_add_memory_node
* hw/arm/virt: Allow user-creatable SMMUv3 dev instantiation
* system: drop the -old-param option
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmjJpt8ZHHBldGVyLm1h
eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3vRGEACO3VrePiMIA9N7egqlUiGn
aRQVqIKeuPVj6TRVG7BSNWlAX8qvnOWOKg1yGVHDZv/nLvRje9UyfUAw7pf6jXod
bzxWBCPJ0J0eOB64Tz87WRCLltKB5pEN+uIG00PtpBcXT1ixYCDgBZXyD3mwuJ4Q
5Yc5hEwQzpmh+EycLtfCHbmjKDw3x1ncpVlGceOG4h5fvzIvIhcNcZJXfAHhbhyO
Y4c5PELrCkCLZaTtSSxd6VJ+vXQ9bNWyKaSZu2KRRnLcMeAqw2Ic7dLPlkzCVyxM
PTOHy4TuDu+kqCbkxdnhpI6fvq5kcHyfTL6qX6tth8ZZS+qKGtvMEIXnYoy6q1kh
4jV5vizK8avx31fSiuTKVpttRv4dC+Aq5QrcgYtIVMeOwtkWHv610D8gcFPmXoG+
uHX9WdzOjrYOzXVKzJaCZF6b7L31ptSEfOrx7asBC9k2wPRwonFXg4JGNq16Yann
aAO5TM7NAUvM2IPgqS+Tf1Bk0iQqORxGfqzCyL76OO/QMMgfBy9elKH0UR0G+ePJ
yjpub1oWIELSXsQGMrdFo1W4/NIpFMTu3DP9W+6XRPu1AvrAx/AsrTuvSvXoeFY9
d/U3yWAXm5XxRzbCIUg7ke8I8zLwRz924M5PA8vophvSnfDLS3V8CJHLwbz/PqYc
0P2KCeI6d2NIhVik4mgEoQ==
=5tK3
-----END PGP SIGNATURE-----
Merge tag 'pull-target-arm-20250916' of https://gitlab.com/pm215/qemu into staging
target-arm queue:
* tests, scripts: Don't import print_function from __future__
* Implement FEAT_ATS1A
* Remove deprecated pxa CPU family
* arm/kvm: report registers we failed to set
* Expose SME registers to GDB via gdbstub
* linux-user/aarch64: Generate ESR signal records
* hw/arm/raspi4b: remove redundant check in raspi_add_memory_node
* hw/arm/virt: Allow user-creatable SMMUv3 dev instantiation
* system: drop the -old-param option
# -----BEGIN PGP SIGNATURE-----
#
# iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmjJpt8ZHHBldGVyLm1h
# eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3vRGEACO3VrePiMIA9N7egqlUiGn
# aRQVqIKeuPVj6TRVG7BSNWlAX8qvnOWOKg1yGVHDZv/nLvRje9UyfUAw7pf6jXod
# bzxWBCPJ0J0eOB64Tz87WRCLltKB5pEN+uIG00PtpBcXT1ixYCDgBZXyD3mwuJ4Q
# 5Yc5hEwQzpmh+EycLtfCHbmjKDw3x1ncpVlGceOG4h5fvzIvIhcNcZJXfAHhbhyO
# Y4c5PELrCkCLZaTtSSxd6VJ+vXQ9bNWyKaSZu2KRRnLcMeAqw2Ic7dLPlkzCVyxM
# PTOHy4TuDu+kqCbkxdnhpI6fvq5kcHyfTL6qX6tth8ZZS+qKGtvMEIXnYoy6q1kh
# 4jV5vizK8avx31fSiuTKVpttRv4dC+Aq5QrcgYtIVMeOwtkWHv610D8gcFPmXoG+
# uHX9WdzOjrYOzXVKzJaCZF6b7L31ptSEfOrx7asBC9k2wPRwonFXg4JGNq16Yann
# aAO5TM7NAUvM2IPgqS+Tf1Bk0iQqORxGfqzCyL76OO/QMMgfBy9elKH0UR0G+ePJ
# yjpub1oWIELSXsQGMrdFo1W4/NIpFMTu3DP9W+6XRPu1AvrAx/AsrTuvSvXoeFY9
# d/U3yWAXm5XxRzbCIUg7ke8I8zLwRz924M5PA8vophvSnfDLS3V8CJHLwbz/PqYc
# 0P2KCeI6d2NIhVik4mgEoQ==
# =5tK3
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 16 Sep 2025 11:05:19 AM PDT
# gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg: issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [unknown]
# gpg: aka "Peter Maydell <pmaydell@gmail.com>" [unknown]
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [unknown]
# gpg: aka "Peter Maydell <peter@archaic.org.uk>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* tag 'pull-target-arm-20250916' of https://gitlab.com/pm215/qemu: (36 commits)
hw/usb/network: Remove hardcoded 0x40 prefix in STRING_ETHADDR response
qtest/bios-tables-test: Update tables for smmuv3 tests
qtest/bios-tables-test: Add tests for legacy smmuv3 and smmuv3 device
bios-tables-test: Allow for smmuv3 test data.
qemu-options.hx: Document the arm-smmuv3 device
hw/arm/virt: Allow user-creatable SMMUv3 dev instantiation
hw/pci: Introduce pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
hw/arm/virt: Add an SMMU_IO_LEN macro
hw/arm/virt: Factor out common SMMUV3 dt bindings code
hw/arm/virt-acpi-build: Update IORT for multiple smmuv3 devices
hw/arm/virt-acpi-build: Re-arrange SMMUv3 IORT build
hw/arm/smmu-common: Check SMMU has PCIe Root Complex association
target/arm: Added test case for SME register exposure to GDB
target/arm: Added support for SME register exposure to GDB
target/arm: Increase MAX_PACKET_LENGTH for SME ZA remote gdb debugging
arm/kvm: report registers we failed to set
system: drop the -old-param option
target/arm: Drop ARM_FEATURE_IWMMXT handling
target/arm: Drop ARM_FEATURE_XSCALE handling
target/arm: Remove iwmmxt helper functions
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Kirill Martynov reported assertation in cpu_asidx_from_attrs() being hit
when x86_cpu_dump_state() is called to dump the CPU state[*]. It happens
when the CPU is in SMM and KVM emulation failure due to misbehaving
guest.
The root cause is that QEMU i386 never enables the SMM address space for
cpu since KVM SMM support has been added.
Enable the SMM cpu address space under KVM when the SMM is enabled for
the x86machine.
[*] https://lore.kernel.org/qemu-devel/20250523154431.506993-1-stdcalllevi@yandex-team.ru/
Reported-by: Kirill Martynov <stdcalllevi@yandex-team.ru>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Kirill Martynov <stdcalllevi@yandex-team.ru>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250730095253.1833411-2-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make the code common to all accelerators: after seeing cpu->exit_request
set to true, accelerator code needs to reach qemu_process_cpu_events_common().
So for the common cases where they use qemu_process_cpu_events(), go ahead and
clear it in there. Note that the cheap qatomic_set() is enough because
at this point the thread has taken the BQL; qatomic_set_mb() is not needed.
In particular, this is the ordering of the communication between
I/O and vCPU threads is always the same.
In the I/O thread:
(a) store other memory locations that will be checked if cpu->exit_request
or cpu->interrupt_request is 1 (for example cpu->stop or cpu->work_list
for cpu->exit_request)
(b) cpu_exit(): store-release cpu->exit_request, or
(b) cpu_interrupt(): store-release cpu->interrupt_request
>>> at this point, cpu->halt_cond is broadcast and the BQL released
(c) do the accelerator-specific kick (e.g. write icount_decr for TCG,
pthread_kill for KVM, etc.)
In the vCPU thread instead the opposite order is respected:
(c) the accelerator's execution loop exits thanks to the kick
(b) then the inner execution loop checks cpu->interrupt_request
and cpu->exit_request. If needed cpu->interrupt_request is
converted into cpu->exit_request when work is needed outside
the execution loop.
(a) then the other memory locations are checked. Some may need to
be read under the BQL, but the vCPU thread may also take other
locks (e.g. for queued work items) or none at all.
qatomic_set_mb() would only be needed if the halt sleep was done
outside the BQL (though in that case, cpu->exit_request probably
would be replaced by a QemuEvent or something like that).
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Do so before extending it to the user-mode emulators, where there is no
such thing as an "I/O thread".
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that cpu_exit() actually kicks all accelerators, use it whenever
the message to another thread is processed in qemu_wait_io_event().
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Writes to interrupt_request used non-atomic accesses, but there are a
few cases where the access was not protected by the BQL. Now that
there is a full set of helpers, it's easier to guarantee that
interrupt_request accesses are fully atomic, so just drop the
requirement instead of fixing them.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We deprecated the command line option -old-param for the 10.0
release, which allows us to drop it in 10.2. This option was used to
boot Arm targets with a very old boot protocol using the
'param_struct' ABI. We only ever needed this on a handful of board
types which have all now been removed from QEMU.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20250828162700.3308812-1-peter.maydell@linaro.org
Currently, QEMU refcounts the MR by always taking it from the owner.
It's common that one object will have multiple MR objects embeded in the
object itself. All the MRs in this case share the same lifespan of the
owner object.
It's also common that in the instance_init() of an object, MR A can be a
container of MR B, C, D, by using memory_region_add_subregion*() set of
memory region APIs.
Now we have a circular reference issue, as when adding subregions for MR A,
we essentially incremented the owner's refcount within the instance_init(),
meaning the object will be self-boosted and its refcount can never go down
to zero if the MRs won't get detached properly before object's finalize().
Delete subregions within object's finalize() won't work either, because
finalize() will be invoked only if the refcount goes to zero first. What
is worse, object_finalize() will do object_property_del_all() first before
object_deinit(). Since embeded MRs will be properties of the owner object,
it means they'll be freed _before_ the owner's finalize().
To fix that, teach memory API to stop refcount on MRs that share the same
owner. Because if they share the lifecycle of the owner, then they share
the same lifecycle between themselves, hence the refcount doesn't help but
only introduce troubles.
Meanwhile, allow auto-detachments of MRs during finalize() of MRs even
against its container, as long as they belong to the same owner.
The latter is needed because now it's possible to have MRs' finalize()
happen in any order when they share the same lifespan with a same owner.
In this case, we should allow finalize() to happen in any order of either
the parent or child MR. Loose the mr->container check in MR's finalize()
to allow auto-detach. Double check it shares the same owner.
Proper document this behavior in code.
This patch is heavily based on the work done by Akihiko Odaki:
https://lore.kernel.org/r/CAFEAcA8DV40fGsci76r4yeP1P-SP_QjNRDD2OzPxjx5wRs0GEg@mail.gmail.com
Cc: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Peter Maydell <peter.maydell@linaro.org>
Link: https://lore.kernel.org/r/20250826221750.285242-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
flatview_access_allowed() should pass in the address offset of the memory
region, rather than the global address space. Shouldn't be a major issue
yet, since the addr is only used in an error log.
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Fixes: 3ab6fdc91b ("softmmu/physmem: Introduce MemTxAttrs::memory field and MEMTX_ACCESS_ERROR")
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20250903142932.1038765-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
When compiling QEMU with --enable-ubsan there is a undefined behavior
warning when running the bios-tables-test for example:
.../system/physmem.c:3243:13: runtime error: applying non-zero offset 262144 to null pointer
#0 0x55ac1df5fbc4 in address_space_write_rom_internal .../system/physmem.c:3243:13
The problem is that buf is indeed NULL if the function is e.g. called
with type == FLUSH_CACHE. Add a check to fix the issue.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250728172545.314178-1-thuth@redhat.com>
This patch brings back Jan's idea [1] of BQL-free IO access
This will let us make access to ACPI PM/HPET timers cheaper,
and prevent BQL contention in case of workload that heavily
uses the timers with a lot of vCPUs.
1) 196ea13104 (memory: Add global-locking property to memory regions)
... de7ea885c5 (kvm: Switch to unlocked MMIO)
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20250814160600.2327672-2-imammedo@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The helpers form load-acquire/store-release pair and ensure
that appropriate barriers are in place in case checks happen
outside of BQL.
Use them to replace open-coded checkers/setters across the code,
to make sure that barriers are not missed. Helpers also make code a
bit more readable.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Link: https://lore.kernel.org/r/20250821155603.2422553-1-imammedo@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A use-after-free bug was reported when booting a Linux kernel during the
pci setup phase. It's quite hard to reproduce (needs smp, and favored by
having several pci devices with BAR and specific Linux config, which
is Debian default one in this case).
After investigation (see the associated bug ticket), it appears that,
under specific conditions, we might access a cached AddressSpaceDispatch
that was reclaimed by RCU thread meanwhile.
In the Linux boot scenario, during the pci phase, memory region are
destroyed/recreated, resulting in exposition of the bug.
The core of the issue is that we cache the dispatch associated to
current cpu in cpu->cpu_ases[asidx].memory_dispatch. It is updated with
tcg_commit, which runs asynchronously on a given cpu.
At some point, we leave the rcu critial section, and the RCU thread
starts reclaiming it, but tcg_commit is not yet invoked, resulting in
the use-after-free.
It's not the first problem around this area, and commit 0d58c66068 [1]
("softmmu: Use async_run_on_cpu in tcg_commit") already tried to
address it. It did a good job, but it seems that we found a specific
situation where it's not enough.
This patch takes a simple approach: remove the cached value creating the
issue, and make sure we always get the current mapping for address
space, using address_space_to_dispatch(cpu->cpu_ases[asidx].as).
It's equivalent to qatomic_rcu_read(&as->current_map)->dispatch;
This is not really costly, we just need two dereferences,
including one atomic (rcu) read, which is negligible considering we are
already on mmu slow path anyway.
Note that tcg_commit is still needed, as it's taking care of flushing
TLB, removing previously mapped entries.
Another solution would be to cache directly values under the dispatch
(dispatch themselves are not ref counted), keep an active reference on
associated memory section, and release it when appropriate (tricky).
Given the time already spent debugging this area now and previously, I
strongly prefer eliminating the root of the issue, instead of adding
more complexity for a hypothetical performance gain. RCU is precisely
used to ensure good performance when reading data, so caching is not as
beneficial as it might seem IMHO.
[1] 0d58c66068
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3040
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Message-ID: <20250724161142.2803091-1-pierrick.bouvier@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Generally APIs to the rest of QEMU should be documented in the headers.
Comments on individual functions or internal details are fine to live
in the C files. Make qemu_add_vm_change_state_handler_prio[_full]()
docstrings consistent by moving them from source to header.
Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-Id: <20250715171920.89670-1-philmd@linaro.org>
Only accelerator implementations (and the common accelator
code) need to know about AccelClass internals. Move the
definition out but forward declare AccelState and AccelClass.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250703173248.44995-39-philmd@linaro.org>
Unfortunately "system/accel-ops.h" handlers are not only
system-specific. For example, the cpu_reset_hold() hook
is part of the vCPU creation, after it is realized.
Mechanical rename to drop 'system' using:
$ sed -i -e s_system/accel-ops.h_accel/accel-cpu-ops.h_g \
$(git grep -l system/accel-ops.h)
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250703173248.44995-38-philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250708215320.70426-6-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Like other "-foo help" CLI, the qemu process should return 0 for
"-tpmdev help".
While touching this, switch to is_help_option() utility function as
suggested by Peter Maydell.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20250707101412.2055581-1-marcandre.lureau@redhat.com>
This can be useful for devices that might take too long to shut down
gracefully, but may have a way to shutdown quickly otherwise if needed
or explicitly requested by a force shutdown.
For now we only consider SIGTERM or the QMP quit() command a force
shutdown, since those bypass the guest entirely and are equivalent to
pulling the power plug.
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Message-Id: <20250609212547.2859224-2-d-tatianin@yandex-team.ru>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Coverity reported a unnecessary NULL check:
qemu/system/qdev-monitor.c: 720 in qdev_device_add_from_qdict()
683 /* create device */
684 dev = qdev_new(driver);
...
719 err_del_dev:
>>> CID 1590192: Null pointer dereferences (REVERSE_INULL)
>>> Null-checking "dev" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
720 if (dev) {
721 object_unparent(OBJECT(dev));
722 object_unref(OBJECT(dev));
723 }
724 return NULL;
725 }
Indeed, unlike qdev_try_new() which can return NULL,
qdev_new() always returns a heap pointer (or aborts).
Remove the unnecessary assignment and check.
Fixes: f3a8505656 ("qdev/qbus: add hidden device support")
Resolves: Coverity CID 1590192 (Null pointer dereferences)
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Accelerators call pre_resume() once. Since it isn't a method to
call for each vCPU, move it from AccelOpsClass to AccelClass.
Adapt WHPX.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250702185332.43650-21-philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20250703173248.44995-34-philmd@linaro.org>
In order to dispatch over AccelOpsClass::handle_interrupt(),
we need it always defined, not calling a hidden handler under
the hood. Make AccelOpsClass::handle_interrupt() mandatory.
Expose generic_handle_interrupt() prototype and register it
for each accelerator.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Mads Ynddal <mads@ynddal.dk>
Message-Id: <20250703173248.44995-29-philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20250703173248.44995-34-philmd@linaro.org>
In order to dispatch over AccelOpsClass::handle_interrupt(),
we need it always defined, not calling a hidden handler under
the hood. Make AccelOpsClass::handle_interrupt() mandatory.
Expose generic_handle_interrupt() prototype and register it
for each accelerator.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20250703173248.44995-29-philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20250703173248.44995-5-philmd@linaro.org>
Define qemu_ram_get_fd_offset, so CPR can map a memory region using
IOMMU_IOAS_MAP_FILE in a subsequent patch.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-8-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
A new field, attributes, was introduced in RAMBlock to link to a
RamBlockAttributes object, which centralizes all guest_memfd related
information (such as fd and status bitmap) within a RAMBlock.
Create and initialize the RamBlockAttributes object upon ram_block_add().
Meanwhile, register the object in the target RAMBlock's MemoryRegion.
After that, guest_memfd-backed RAMBlock is associated with the
RamDiscardManager interface, and the users can execute RamDiscardManager
specific handling. For example, VFIO will register the
RamDiscardListener and get notifications when the state_change() helper
invokes.
As coordinate discarding of RAM with guest_memfd is now supported, only
block uncoordinated discard.
Tested-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Alexey Kardashevskiy <aik@amd.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Link: https://lore.kernel.org/r/20250612082747.51539-6-chenyi.qiang@intel.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated
discard") highlighted that subsystems like VFIO may disable RAM block
discard. However, guest_memfd relies on discard operations for page
conversion between private and shared memory, potentially leading to
the stale IOMMU mapping issue when assigning hardware devices to
confidential VMs via shared memory. To address this and allow shared
device assignement, it is crucial to ensure the VFIO system refreshes
its IOMMU mappings.
RamDiscardManager is an existing interface (used by virtio-mem) to
adjust VFIO mappings in relation to VM page assignment. Effectively page
conversion is similar to hot-removing a page in one mode and adding it
back in the other. Therefore, similar actions are required for page
conversion events. Introduce the RamDiscardManager to guest_memfd to
facilitate this process.
Since guest_memfd is not an object, it cannot directly implement the
RamDiscardManager interface. Implementing it in HostMemoryBackend is
not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks
have a memory backend while others do not. Notably, virtual BIOS
RAMBlocks using memory_region_init_ram_guest_memfd() do not have a
backend.
To manage RAMBlocks with guest_memfd, define a new object named
RamBlockAttributes to implement the RamDiscardManager interface. This
object can store the guest_memfd information such as the bitmap for
shared memory and the registered listeners for event notifications. A
new state_change() helper function is provided to notify listeners, such
as VFIO, allowing VFIO to do dynamically DMA map and unmap for the shared
memory according to conversion events. Note that in the current context
of RamDiscardManager for guest_memfd, the shared state is analogous to
being populated, while the private state can be considered discarded for
simplicity. In the future, it would be more complicated if considering
more states like private/shared/discarded at the same time.
In current implementation, memory state tracking is performed at the
host page size granularity, as the minimum conversion size can be one
page per request. Additionally, VFIO expected the DMA mapping for a
specific IOVA to be mapped and unmapped with the same granularity.
Confidential VMs may perform partial conversions, such as conversions on
small regions within a larger one. To prevent such invalid cases and
until support for DMA mapping cut operations is available, all
operations are performed with 4K granularity.
In addition, memory conversion failures cause QEMU to quit rather than
resuming the guest or retrying the operation at present. It would be
future work to add more error handling or rollback mechanisms once
conversion failures are allowed. For example, in-place conversion of
guest_memfd could retry the unmap operation during the conversion from
shared to private. For now, keep the complex error handling out of the
picture as it is not required.
Tested-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Link: https://lore.kernel.org/r/20250612082747.51539-5-chenyi.qiang@intel.com
[peterx: squash fixup from Chenyi to fix builds]
Signed-off-by: Peter Xu <peterx@redhat.com>
Update ReplayRamDiscard() function to return the result and unify the
ReplayRamPopulate() and ReplayRamDiscard() to ReplayRamDiscardState() at
the same time due to their identical definitions. This unification
simplifies related structures, such as VirtIOMEMReplayData, which makes
it cleaner.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Link: https://lore.kernel.org/r/20250612082747.51539-4-chenyi.qiang@intel.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Modify memory_region_set_ram_discard_manager() to return -EBUSY if a
RamDiscardManager is already set in the MemoryRegion. The caller must
handle this failure, such as having virtio-mem undo its actions and fail
the realize() process. Opportunistically move the call earlier to avoid
complex error handling.
This change is beneficial when introducing a new RamDiscardManager
instance besides virtio-mem. After
ram_block_coordinated_discard_require(true) unlocks all
RamDiscardManager instances, only one instance is allowed to be set for
one MemoryRegion at present.
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Tested-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Link: https://lore.kernel.org/r/20250612082747.51539-3-chenyi.qiang@intel.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Modify memory_get_xlat_addr and vfio_get_xlat_addr to return the memory
region that the translated address is found in. This will be needed by
CPR in a subsequent patch to map blocks using IOMMU_IOAS_MAP_FILE.
Also return the xlat offset, so we can simplify the interface by removing
the out parameters that can be trivially derived from mr and xlat.
Lastly, rename the functions to to memory_translate_iotlb() and
vfio_translate_iotlb().
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Link: https://lore.kernel.org/qemu-devel/1747661203-136490-1-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
* target/i386/emulate: more lflags cleanups
* meson: remove need for explicit listing of dependencies in hw_common_arch and
target_common_arch
* rust: small fixes
* hpet: Reorganize register decoding to be more similar to Rust code
* target/i386: fixes for AMD models
* target/i386: new EPYC-Turin CPU model
-----BEGIN PGP SIGNATURE-----
iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmg4BxwUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroP67gf+PEP4EDQP0AJUfxXYVsczGf5snGjz
ro8jYmKG+huBZcrS6uPK5zHYxtOI9bHr4ipTHJyHd61lyzN6Ys9amPbs/CRE2Q4x
Ky4AojPhCuaL2wHcYNcu41L+hweVQ3myj97vP3hWvkatulXYeMqW3/4JZgr4WZ69
A9LGLtLabobTz5yLc8x6oHLn/BZ2y7gjd2LzTz8bqxx7C/kamjoDrF2ZHbX9DLQW
BKWQ3edSO6rorSNHWGZsy9BE20AEkW2LgJdlV9eXglFEuEs6cdPKwGEZepade4bQ
Rdt2gHTlQdUDTFmAbz8pttPxFGMC9Zpmb3nnicKJpKQAmkT/x4k9ncjyAQ==
=XmkU
-----END PGP SIGNATURE-----
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* target/i386/kvm: Intel TDX support
* target/i386/emulate: more lflags cleanups
* meson: remove need for explicit listing of dependencies in hw_common_arch and
target_common_arch
* rust: small fixes
* hpet: Reorganize register decoding to be more similar to Rust code
* target/i386: fixes for AMD models
* target/i386: new EPYC-Turin CPU model
# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmg4BxwUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroP67gf+PEP4EDQP0AJUfxXYVsczGf5snGjz
# ro8jYmKG+huBZcrS6uPK5zHYxtOI9bHr4ipTHJyHd61lyzN6Ys9amPbs/CRE2Q4x
# Ky4AojPhCuaL2wHcYNcu41L+hweVQ3myj97vP3hWvkatulXYeMqW3/4JZgr4WZ69
# A9LGLtLabobTz5yLc8x6oHLn/BZ2y7gjd2LzTz8bqxx7C/kamjoDrF2ZHbX9DLQW
# BKWQ3edSO6rorSNHWGZsy9BE20AEkW2LgJdlV9eXglFEuEs6cdPKwGEZepade4bQ
# Rdt2gHTlQdUDTFmAbz8pttPxFGMC9Zpmb3nnicKJpKQAmkT/x4k9ncjyAQ==
# =XmkU
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 29 May 2025 03:05:00 EDT
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (77 commits)
target/i386/tcg/helper-tcg: fix file references in comments
target/i386: Add support for EPYC-Turin model
target/i386: Update EPYC-Genoa for Cache property, perfmon-v2, RAS and SVM feature bits
target/i386: Add couple of feature bits in CPUID_Fn80000021_EAX
target/i386: Update EPYC-Milan CPU model for Cache property, RAS, SVM feature bits
target/i386: Update EPYC-Rome CPU model for Cache property, RAS, SVM feature bits
target/i386: Update EPYC CPU model for Cache property, RAS, SVM feature bits
rust: make declaration of dependent crates more consistent
docs: Add TDX documentation
i386/tdx: Validate phys_bits against host value
i386/tdx: Make invtsc default on
i386/tdx: Don't treat SYSCALL as unavailable
i386/tdx: Fetch and validate CPUID of TD guest
target/i386: Print CPUID subleaf info for unsupported feature
i386: Remove unused parameter "uint32_t bit" in feature_word_description()
i386/cgs: Introduce x86_confidential_guest_check_features()
i386/tdx: Define supported KVM features for TDX
i386/tdx: Add XFD to supported bit of TDX
i386/tdx: Add supported CPUID bits relates to XFAM
i386/tdx: Add supported CPUID bits related to TD Attributes
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The MachineClass::legacy_fw_cfg_order boolean was only used
by the pc-q35-2.5 and pc-i440fx-2.5 machines, which got
removed. Remove it along with:
- FW_CFG_ORDER_OVERRIDE_* definitions
- fw_cfg_set_order_override()
- fw_cfg_reset_order_override()
- fw_cfg_order[]
- rom_set_order_override()
- rom_reset_order_override()
Simplify CLI and pc_vga_init() / pc_nic_init().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250512083948.39294-12-philmd@linaro.org>
[thuth: Fix error from check_patch.pl wrt to an empty "for" loop]
Signed-off-by: Thomas Huth <thuth@redhat.com>
This patch adds the new VM state change cb type `VMChangeStateHandlerWithRet`,
which has return value for `VMChangeStateEntry`.
Thus, we can register a new VM state change cb with return value for device.
Note that `VMChangeStateHandler` and `VMChangeStateHandlerWithRet` are mutually
exclusive and cannot be provided at the same time.
This patch is the pre patch for 'vhost-user: return failure if backend crashes
when live migration', which makes the live migration aware of the loss of
connection with the vhost-user backend and aborts the live migration.
Virtio device will use VMChangeStateHandlerWithRet.
Signed-off-by: Haoqian He <haoqian.he@smartx.com>
Message-Id: <20250416024729.3289157-2-haoqian.he@smartx.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Emscripten does not support partial unmapping of mmapped memory
regions[1]. This limitation prevents correct implementation of qemu_ram_mmap
and qemu_ram_munmap, which rely on partial unmap behavior.
As a workaround, this commit excludes mmap-alloc.c from the Emscripten
build. Instead, for Emscripten build, this modifies qemu_anon_ram_alloc to
use qemu_memalign in place of qemu_ram_mmap, and disable memory backends
that rely on mmap, such as memory-backend-file and memory-backend-shm.
[1] d4a74336f2/system/lib/libc/emscripten_mmap.c (L61)
Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
Link: https://lore.kernel.org/r/76834f933ee4f14eeb5289d21c59d306886e58e9.1745820062.git.ktokunaga.mail@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Binaries can register a QOM type to filter their machines
by filling their TargetInfo::machine_typename field.
This can be used by example by main() -> machine_help_func()
to filter the machines list.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
"exec/exec-all.h" is now fully empty, let's remove it.
Mechanical change running:
$ sed -i '/exec\/exec-all.h/d' $(git grep -wl exec/exec-all.h)
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Mark Cave-Ayland <mark.caveayland@nutanix.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250424202412.91612-14-philmd@linaro.org>
Restrict iotlb_to_section(), address_space_translate_for_iotlb()
and memory_region_section_get_iotlb() to TCG. Declare them in
the new "accel/tcg/iommu.h" header. Declare iotlb_to_section()
using the MemoryRegionSection typedef.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250424202412.91612-12-philmd@linaro.org>