qemu-cr16

Author	SHA1	Message	Date
Fiona Ebner	96acc034ff	iotests: add test for changing the 'drive' property via 'qom-set' Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250605100938.43133-1-f.ebner@proxmox.com> [kwolf: Fixed up pylint warnings flagged by 297] Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-15 20:48:27 +02:00
Kevin Wolf	d402da1360	file-posix: Fix aio=threads performance regression after enablign FUA For aio=threads, we're currently not implementing REQ_FUA in any useful way, but just do a separate raw_co_flush_to_disk() call. This changes behaviour compared to the old state, which used bdrv_co_flush() with its optimisations. As a quick fix, call bdrv_co_flush() again like before. Eventually, we can use pwritev2() to make use of RWF_DSYNC if available, but we'll still have to keep this code path as a fallback, so this fix is required either way. While the fix itself is a one-liner, some new graph locking annotations are needed to convince TSA that the locking is correct. Cc: qemu-stable@nongnu.org Fixes: `984a32f17e` ("file-posix: Support FUA writes") Buglink: https://issues.redhat.com/browse/RHEL-96854 Reported-by: Tingting Mao <timao@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20250625085019.27735-1-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 17:12:35 +02:00
Fiona Ebner	430e2be81e	block/qapi: make @node-name in @BlockDeviceInfo non-optional Since commit `15489c769b` ("block: auto-generated node-names"), if the node name of a block driver state is not explicitly specified, it will be auto-generated. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250702123204.325470-3-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 17:11:01 +02:00
Fiona Ebner	cfac5a963e	block/qapi: include child references in block device info In combination with using a throttle filter to enforce IO limits for a guest device, knowing the 'file' child of a block device can be useful. If the throttle filter is only intended for guest IO, block jobs should not also be limited by the throttle filter, so the block operations need to be done with the 'file' child of the top throttle node as the target. In combination with mirroring, the name of that child is not fixed. Another scenario is when unplugging a guest device after mirroring below a top throttle node, where the mirror target is added explicitly via blockdev-add. After mirroring, the target becomes the new 'file' child of the throttle node. For unplugging, both the top throttle node and the mirror target need to be deleted, because only implicitly added child nodes are deleted automatically, and the current 'file' child of the throttle node was explicitly added (as the mirror target). In other scenarios, it could be useful to follow the backing chain. Note that iotests 191 and 273 use _filter_img_info, so the 'children' information is filtered out there. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250702123204.325470-2-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 17:10:57 +02:00
Fiona Ebner	a256a427b0	blockjob: mark block_job_remove_all_bdrv() as GRAPH_UNLOCKED The function block_job_remove_all_bdrv() calls bdrv_graph_wrlock_drained(), which must be called with the graph unlocked. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-49-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:42:28 +02:00
Fiona Ebner	2cf92b15cd	block: mark bdrv_open_child_common() and its callers GRAPH_UNLOCKED The function bdrv_open_child_common() calls bdrv_graph_wrlock_drained(), which must be called with the graph unlocked. Mark it and its two callers bdrv_open_file_child() and bdrv_open_child() as GRAPH_UNLOCKED. This requires temporarily unlocking in vmdk_parse_extents() and making the locked section shorter in vmdk_open(). Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-48-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:42:27 +02:00
Fiona Ebner	ede0859311	block: mark bdrv_close() as GRAPH_UNLOCKED The functions blk_log_writes_close(), blkverify_close(), quorum_close(), vmdk_close() via vmdk_free_extents(), and other bdrv_close() implementations call bdrv_graph_wrlock_drained(), which must be called with the graph unlocked. They are reached via the BlockDriver's bdrv_close() callback and the bdrv_close() wrapper, which are also marked as GRAPH_UNLOCKED_PTR and GRAPH_UNLOCKED. Furthermore, the function bdrv_close() also calls bdrv_drained_begin() and bdrv_graph_wrlock_drained(), so there are additional reasons for marking it GRAPH_UNLOCKED. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-47-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:42:26 +02:00
Fiona Ebner	6d7e3f8de0	block: mark bdrv_close_all() as GRAPH_UNLOCKED The function bdrv_close_all() calls bdrv_drain_all(), which must be called with the graph unlocked. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-46-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:42:25 +02:00
Fiona Ebner	94371745d7	block: mark bdrv_drop_intermediate() as GRAPH_UNLOCKED The function bdrv_drop_intermediate() calls bdrv_drained_begin(), which must be called with the graph unlocked. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-45-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:42:23 +02:00
Fiona Ebner	04f4d9c555	block: mark bdrv_insert_node() as GRAPH_UNLOCKED The function bdrv_insert_node() calls bdrv_drained_begin() which must be called with the graph unlocked. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-44-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:42:22 +02:00
Fiona Ebner	5d04823347	block: mark bdrv_replace_child_bs() as GRAPH_UNLOCKED The function bdrv_replace_child_bs() calls bdrv_drained_begin() which must be called with the graph unlocked. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-43-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:42:21 +02:00
Kevin Wolf	975d9ff32e	block: Allow bdrv_new() with and without graph lock bdrv_new() calls bdrv_drained_begin(), which can poll and therefore can't be called while holding the graph lock. One option to make sure that this call is allowed would be marking bdrv_new() GRAPH_UNLOCKED. However, this is actually an unnecessary restriction because we know that we only just created the BlockDriverState and it isn't even part of the graph yet. We can use bdrv_do_drained_begin_quiesce() instead to avoid the polling, which means that bdrv_new() can now safely be called from callers that hold the graph lock as well as from callers that don't. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:42:15 +02:00
Fiona Ebner	60f609c152	block/commit: mark commit_abort() as GRAPH_UNLOCKED The function commit_abort() calls bdrv_drained_begin(), which must be called with the graph unlocked. Also mark the JobDriver's abort() callback as GRAPH_UNLOCKED_PTR, because that is the callback via which commit_abort() is reached. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-41-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:42:13 +02:00
Fiona Ebner	7bb9bd52ec	block-backend: mark blk_io_limits_disable() as GRAPH_UNLOCKED The function blk_io_limits_disable() calls bdrv_drained_begin(), which must be called with the graph unlocked. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-40-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:42:12 +02:00
Fiona Ebner	f3e84330f7	block: mark blk_drain() as GRAPH_UNLOCKED The function blk_drain() calls bdrv_drained_begin(), which must be called with the graph unlocked. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-39-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:42:11 +02:00
Fiona Ebner	b326b127df	block: mark blk_remove_bs() as GRAPH_UNLOCKED The function blk_remove_bs() calls bdrv_graph_wrlock_drained() and can also call bdrv_drained_begin(), both of which which must be called with the graph unlocked. Marking blk_remove_bs() as GRAPH_UNLOCKED requires temporarily unlocking in hmp_drive_del(). Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-38-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:42:10 +02:00
Fiona Ebner	7525aa25db	block: mark bdrv_inactivate_all() as GRAPH_UNLOCKED The function bdrv_inactivate_all() calls bdrv_drain_all_begin(), which must be called with the graph unlocked. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-37-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:42:08 +02:00
Fiona Ebner	e2d9cc5790	block: mark bdrv_inactivate() as GRAPH_RDLOCK and move drain to callers The function bdrv_inactivate() calls bdrv_drain_all_begin(), which needs to be called with the graph unlocked, so either bdrv_inactivate() should be marked as GRAPH_UNLOCKED or the drain needs to be moved to the callers. The caller in qmp_blockdev_set_active() requires that the locked section covers bdrv_find_node() too, so the latter alternative is chosen. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-36-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:42:07 +02:00
Fiona Ebner	6717dc3075	block: mark bdrv_reopen_queue() and bdrv_reopen_multiple() as GRAPH_UNLOCKED The function bdrv_reopen_queue() can call bdrv_drain_all_begin(), which must be called with the graph unlocked. The function bdrv_reopen_multiple() calls bdrv_reopen_prepare() which must be called with the graph unlocked. To mark bdrv_reopen_queue() as GRAPH_UNLOCKED, it is necessary to make the locked section in reopen_backing_file() shorter. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-35-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:42:05 +02:00
Fiona Ebner	0a0474b065	block/stream: mark stream_prepare() as GRAPH_UNLOCKED The function stream_prepare() calls bdrv_drain_all_begin(), which must be called with the graph unlocked. Also mark the JobDriver's prepare() callback as GRAPH_UNLOCKED_PTR, because that is the callback via which stream_prepare() is reached. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-34-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:42:04 +02:00
Fiona Ebner	c6b5328b5b	block/snapshot: mark bdrv_all_delete_snapshot() as GRAPH_UNLOCKED The function bdrv_all_delete_snapshot() calls bdrv_drain_all_begin(), which must be called with the graph unlocked. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-33-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:42:02 +02:00
Fiona Ebner	9ec8c4793f	block-backend: mark blk_drain_all() as GRAPH_UNLOCKED The function blk_drain_all() calls bdrv_drain_all_begin(), which must be called with the graph unlocked. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-32-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:42:00 +02:00
Fiona Ebner	54eb59d668	block: drop wrapper for bdrv_set_backing_hd_drained() Nearly all callers (outside of the tests) are already using the _drained() variant of the function. It doesn't seem worth keeping. Simply adapt the remaining callers of bdrv_set_backing_hd() and rename bdrv_set_backing_hd_drained() to bdrv_set_backing_hd(). Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-31-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:41:58 +02:00
Fiona Ebner	c7af387c7b	blockdev: avoid locking and draining multiple times in external_snapshot_abort() By using the appropriate variants bdrv_set_backing_hd_drained() and bdrv_try_change_aio_context_locked(), there only needs to be a single drained and write-locked section in external_snapshot_abort(). Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-30-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:41:57 +02:00
Fiona Ebner	de0d24c711	block: mark bdrv_set_backing_hd() as GRAPH_UNLOCKED The function bdrv_set_backing_hd() calls bdrv_drain_all_begin(), which must be called with the graph unlocked. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-29-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:41:54 +02:00
Fiona Ebner	d7573eba14	block: call bdrv_set_backing_hd() while unlocked in bdrv_open_backing_file() This is in preparation to mark bdrv_set_backing_hd() as GRAPH_UNLOCKED. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-28-f.ebner@proxmox.com> [kwolf: Removed an extra blank line] Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:41:41 +02:00
Fiona Ebner	47bc2ed6f6	block/commit: switch to bdrv_set_backing_hd_drained() variant This is in preparation to mark bdrv_set_backing_hd() as GRAPH_UNLOCKED. Switch to using the bdrv_set_backing_hd_drained() variant. For the first pair of calls to avoid draining and locking twice in a row within the individual calls. For the third call, so that the drained and locked section can also cover bdrv_cow_bs(). Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-27-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:41:38 +02:00
Fiona Ebner	9918b2e95e	block/mirror: switch to bdrv_set_backing_hd_drained() variant This is in preparation to mark bdrv_set_backing_hd() as GRAPH_UNLOCKED. Switch to using the bdrv_set_backing_hd_drained() variant, so that the drained and locked section can also cover the calls to bdrv_skip_filters() and bdrv_cow_bs(). Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-26-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:41:34 +02:00
Fiona Ebner	6b89e851fa	block: add bdrv_graph_wrlock_drained() convenience wrapper Many write-locked sections are also drained sections. A new bdrv_graph_wrunlock_drained() wrapper around bdrv_graph_wrunlock() is introduced, which will begin a drained section first. A global variable is used so bdrv_graph_wrunlock() knows if it also needs to end such a drained section. Both the aio_poll call in bdrv_graph_wrlock() and the aio_bh_poll() in bdrv_graph_wrunlock() can re-enter a write-locked section. While for the latter, ending the drain could be moved to before the call, the former requires that the variable is a counter and not just a boolean. Since the wrapper calls bdrv_drain_all_begin(), which must be called with the graph unlocked, mark the wrapper as GRAPH_UNLOCKED too. The switch to the new helpers was generated with the following commands and then manually checked: find . -name '.c' -exec sed -i -z 's/bdrv_drain_all_begin();\n\sbdrv_graph_wrlock();/bdrv_graph_wrlock_drained();/g' {} ';' find . -name '.c' -exec sed -i -z 's/bdrv_graph_wrunlock();\n\sbdrv_drain_all_end();/bdrv_graph_wrunlock();/g' {} ';' Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-25-f.ebner@proxmox.com> [kwolf: Removed redundant GRAPH_UNLOCKED] Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:40:58 +02:00
Fiona Ebner	502f00c51a	block: never use atomics to access bs->quiesce_counter All accesses of bs->quiesce_counter are in the main thread, either after a GLOBAL_STATE_CODE() macro or in a function with GRAPH_WRLOCK annotation. This is essentially a revert of `414c2ec358` ("block: access quiesce_counter with atomic ops"). At that time, neither the GLOBAL_STATE_CODE() macro nor the GRAPH_WRLOCK annotation existed. Even if the field was only accessed in the main thread back then (did not check if that is actually the case), it wouldn't have been easy to verify. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-24-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-07-14 15:40:45 +02:00
Stefan Hajnoczi	9a4e273dde	fpu: Process float_muladd_negate_result after rounding tcg: Use uintptr_t in tcg_malloc implementation linux-user: Hold the fd-trans lock across fork linux-user: Implement fchmodat2 syscall linux-user: Check for EFAULT failure in nanosleep linux-user: Use qemu_set_cloexec() to mark pidfd as FD_CLOEXEC linux-user/gen-vdso: Handle fseek() failure linux-user/gen-vdso: Don't read off the end of buf[] -----BEGIN PGP SIGNATURE----- iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmhxSAkdHHJpY2hhcmQu aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV9wiQf+PrXwKj+FusE0YU1y Lnx6+S0M/lDRCNhbgBrw7JK5WUwIfnZQuepf0vjuhoHH1rUdT1EUYdJ7Quwj9fgG 0YcKRD8OAVKNU8I3ydtzSaJ3TZ02nbbDbwGMoD/eNXGKx0Gt5907vD4PrjT+mByG 6QTLwuql3ahkl/Tiskk2LwbmHRe0CXiezVuzgprbNiyxrgDT8ArqCq+VJzv/wb2O 4t6BqRDvBzRe7MUUs2B2W+hs0HW4Rfqcye/3rRnYe7HA4CTiVNqY9rwgrQqGEO0P 3Cf+VaF6CaLz+HuHfM8rz+xBhfo+UpZYOVMXk/7VEAG6geMKTcQG1tCJYhL+xklJ 9r4ABw== =rD+6 -----END PGP SIGNATURE----- Merge tag 'pull-tcg-20250711' of https://gitlab.com/rth7680/qemu into staging fpu: Process float_muladd_negate_result after rounding tcg: Use uintptr_t in tcg_malloc implementation linux-user: Hold the fd-trans lock across fork linux-user: Implement fchmodat2 syscall linux-user: Check for EFAULT failure in nanosleep linux-user: Use qemu_set_cloexec() to mark pidfd as FD_CLOEXEC linux-user/gen-vdso: Handle fseek() failure linux-user/gen-vdso: Don't read off the end of buf[] # -----BEGIN PGP SIGNATURE----- # # iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmhxSAkdHHJpY2hhcmQu # aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV9wiQf+PrXwKj+FusE0YU1y # Lnx6+S0M/lDRCNhbgBrw7JK5WUwIfnZQuepf0vjuhoHH1rUdT1EUYdJ7Quwj9fgG # 0YcKRD8OAVKNU8I3ydtzSaJ3TZ02nbbDbwGMoD/eNXGKx0Gt5907vD4PrjT+mByG # 6QTLwuql3ahkl/Tiskk2LwbmHRe0CXiezVuzgprbNiyxrgDT8ArqCq+VJzv/wb2O # 4t6BqRDvBzRe7MUUs2B2W+hs0HW4Rfqcye/3rRnYe7HA4CTiVNqY9rwgrQqGEO0P # 3Cf+VaF6CaLz+HuHfM8rz+xBhfo+UpZYOVMXk/7VEAG6geMKTcQG1tCJYhL+xklJ # 9r4ABw== # =rD+6 # -----END PGP SIGNATURE----- # gpg: Signature made Fri 11 Jul 2025 13:21:13 EDT # gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F # gpg: issuer "richard.henderson@linaro.org" # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full] # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * tag 'pull-tcg-20250711' of https://gitlab.com/rth7680/qemu: linux-user: Use qemu_set_cloexec() to mark pidfd as FD_CLOEXEC tcg: Use uintptr_t in tcg_malloc implementation linux-user: Hold the fd-trans lock across fork linux-user/mips/o32: Drop sa_restorer functionality linux-user/gen-vdso: Don't read off the end of buf[] linux-user/gen-vdso: Handle fseek() failure linux-user: Check for EFAULT failure in nanosleep linux-user: Implement fchmodat2 syscall fpu: Process float_muladd_negate_result after rounding Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2025-07-13 01:46:04 -04:00
Stefan Hajnoczi	52af79811f	Migration pull request - General cleanups around: postcopy, bg-snapshot, migration hooks, migration completion and formatting of 'info migrate'. - Overhaul of postcopy blocktime tracking. -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEqhtIsKIjJqWkw2TPx5jcdBvsMZ0FAmhxGdgQHGZhcm9zYXNA c3VzZS5kZQAKCRDHmNx0G+wxnahoD/9uNXirlmRk3tDnhiJsiYx+HnXYPFEORSZq zlpUyqvhQ1POp3Fa5pRf+bJ5mmPw8h8PdOR2StMpnW2Xa1OatAZj5m1uityAVWOl EkVfZLl0j6j9HCCmE3c4dztOGIBsd9YY0GWizL05XHYZPrdX4zOpolMN4m53RwQY HUVD6T2y9eFDnCO6MsoA9EfmkFYCRvqlS0VzTcYzQFN4H+QHlcpDfweqJpTLPa+1 trahAN9PBuMjoewjDqwkNkf0CLaCXHszAfj6yv62Vi8Cbp9DDPywIYJKFnxspElW Fjg1b4MdsbYZNmeKgIawzgTOL1RrojvKkoi7KWp3D7M+/ZZl9kBwQuUcBXKI7N0R Y0GNfkkTycn18nM0JU/6QWSuVeiPbLArxQUGP1cLgvcHSSNgD9JxWbNBu5+1fFOG Gg3qnyYatJ6xJDiCrdKqV8fwozNlm/G6b9BiCDeVq+4nA2OKQ0shiNA1GZHvVSQL X4uAPexETdHfA/LeA2w5sgVBEw7BewBdjLntZDIFsyBnLrvqrDcU5Aav0wiHoI8U QBC2aIpJfMLHiIQ93mVX96NltXC7KvJTIZVl3iwfiYEYCvQtTYgdJ09ELXFJYxFX XpTTazqpmPSfuZpPRgx9YbDP/kS8Fg/PTOlPeD0T/frFgd1S6Thh6OW455PavMp8 ht2lE4sxjA== =vtRD -----END PGP SIGNATURE----- Merge tag 'migration-20250711-pull-request' of https://gitlab.com/farosas/qemu into staging Migration pull request - General cleanups around: postcopy, bg-snapshot, migration hooks, migration completion and formatting of 'info migrate'. - Overhaul of postcopy blocktime tracking. # -----BEGIN PGP SIGNATURE----- # # iQJEBAABCAAuFiEEqhtIsKIjJqWkw2TPx5jcdBvsMZ0FAmhxGdgQHGZhcm9zYXNA # c3VzZS5kZQAKCRDHmNx0G+wxnahoD/9uNXirlmRk3tDnhiJsiYx+HnXYPFEORSZq # zlpUyqvhQ1POp3Fa5pRf+bJ5mmPw8h8PdOR2StMpnW2Xa1OatAZj5m1uityAVWOl # EkVfZLl0j6j9HCCmE3c4dztOGIBsd9YY0GWizL05XHYZPrdX4zOpolMN4m53RwQY # HUVD6T2y9eFDnCO6MsoA9EfmkFYCRvqlS0VzTcYzQFN4H+QHlcpDfweqJpTLPa+1 # trahAN9PBuMjoewjDqwkNkf0CLaCXHszAfj6yv62Vi8Cbp9DDPywIYJKFnxspElW # Fjg1b4MdsbYZNmeKgIawzgTOL1RrojvKkoi7KWp3D7M+/ZZl9kBwQuUcBXKI7N0R # Y0GNfkkTycn18nM0JU/6QWSuVeiPbLArxQUGP1cLgvcHSSNgD9JxWbNBu5+1fFOG # Gg3qnyYatJ6xJDiCrdKqV8fwozNlm/G6b9BiCDeVq+4nA2OKQ0shiNA1GZHvVSQL # X4uAPexETdHfA/LeA2w5sgVBEw7BewBdjLntZDIFsyBnLrvqrDcU5Aav0wiHoI8U # QBC2aIpJfMLHiIQ93mVX96NltXC7KvJTIZVl3iwfiYEYCvQtTYgdJ09ELXFJYxFX # XpTTazqpmPSfuZpPRgx9YbDP/kS8Fg/PTOlPeD0T/frFgd1S6Thh6OW455PavMp8 # ht2lE4sxjA== # =vtRD # -----END PGP SIGNATURE----- # gpg: Signature made Fri 11 Jul 2025 10:04:08 EDT # gpg: using RSA key AA1B48B0A22326A5A4C364CFC798DC741BEC319D # gpg: issuer "farosas@suse.de" # gpg: Good signature from "Fabiano Rosas <farosas@suse.de>" [unknown] # gpg: aka "Fabiano Almeida Rosas <fabiano.rosas@suse.com>" [unknown] # gpg: WARNING: The key's User ID is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: AA1B 48B0 A223 26A5 A4C3 64CF C798 DC74 1BEC 319D * tag 'migration-20250711-pull-request' of https://gitlab.com/farosas/qemu: (26 commits) migration: Rename save_live_complete_precopy_thread to save_complete_precopy_thread migration/postcopy: Add latency distribution report for blocktime migration/postcopy: blocktime allows track / report non-vCPU faults migration/postcopy: Optimize blocktime fault tracking with hashtable migration/postcopy: Cleanup the total blocktime accounting migration/postcopy: Cache the tid->vcpu mapping for blocktime migration/postcopy: Initialize blocktime context only until listen migration/postcopy: Report fault latencies in blocktime migration/postcopy: Add blocktime fault counts per-vcpu migration/postcopy: Bring blocktime layer to ns level migration/postcopy: Drop PostcopyBlocktimeContext.start_time migration/postcopy: Make all blocktime vars 64bits migration/postcopy: Drop all atomic ops in blocktime feature migration/postcopy: Push blocktime start/end into page req mutex migration: Add option to set postcopy-blocktime migration/postcopy: Avoid clearing dirty bitmap for postcopy too migration: Rewrite the migration complete detect logic migration/ram: Add tracepoints for ram_save_complete() migration/ram: One less indent for ram_find_and_save_block() migration: qemu_savevm_complete*() helpers ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2025-07-13 01:45:30 -04:00
Stefan Hajnoczi	0edc2afe0c	target-arm queue: * New board type max78000fthr * Enable use of CXL on Arm 'virt' board * Some more tidyup of ID register handling * Refactor AT insns and PMU regs into separate source files * Don't enforce NSE,NS check for EL3->EL3 returns * hw/arm/fsl-imx8mp: Wire VIRQ and VFIQ * Allow nested-virtualization with KVM on the 'virt' board * system/qdev: Remove pointless NULL check in qdev_device_add_from_qdict * hw/arm/virt-acpi-build: Don't create ITS id mappings by default * target/arm: Remove unused helper_sme2_luti4_4b -----BEGIN PGP SIGNATURE----- iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmhxEcoZHHBldGVyLm1h eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3j5yEACWYnNeqo8Yph6/EJExE6eV r0tC6FBb5ShPgA6kDxhpOc1lI6uXGh8+D7bL9BePEdz/brCf1QDfs2Z4q/hb5ysX D0H6VI5Gr1j6MjkFRBo3+vvYz4Yh++XLn5Q9lZv8zaSEdraq/ay2kxnuhRCK+4Ar +QoGtKrGMJ7UCpfiRlvNnd1UjgORZf10EE/bRImX13sxeDomP3CZhFzAyJyShOP9 JA7bAd4rYJ4oj8R33y8Yaxjwm4FOndj740B0zwpO8mpjzFiE5zbqsaO+mEgYSflc OQisCu/KRFpyIR+UqP+4gNaJLfKQW5Y4r61zEaiJWV/c4RdKNnbK1f7MX11fNhOk k1paF3GIXp6f794Hb14vtsYnKHF2eeNSmRkAomXxLgUSYzLezL+yj7cdYmRJhgYU thc1PSiEmHYhjRmOaMC9+dkMtvIexWyDNYNFTygoOE5/kTMSazeTFQpFmw+ZuTee 9pjKsYRZJgTa64IkJy1L34jc2gds48Q20KpQsqZ22KQcjwt4PW4eQXkvMylawSut mArHVH6AAxIK+defeEmnQCJ0OccyGCENjRDuWyWMMGoP/ggZpO47rGWmCUOK8xz8 IfGdPeF/9xsKSKWvjpiHyyKa48wuO2bVC+5bISS6IPA2uGneS2DpmjkHU+gHBqpk GNlvEnXZfavZOHejE7/L/Q== =hJ4/ -----END PGP SIGNATURE----- Merge tag 'pull-target-arm-20250711' of https://gitlab.com/pm215/qemu into staging target-arm queue: * New board type max78000fthr * Enable use of CXL on Arm 'virt' board * Some more tidyup of ID register handling * Refactor AT insns and PMU regs into separate source files * Don't enforce NSE,NS check for EL3->EL3 returns * hw/arm/fsl-imx8mp: Wire VIRQ and VFIQ * Allow nested-virtualization with KVM on the 'virt' board * system/qdev: Remove pointless NULL check in qdev_device_add_from_qdict * hw/arm/virt-acpi-build: Don't create ITS id mappings by default * target/arm: Remove unused helper_sme2_luti4_4b # -----BEGIN PGP SIGNATURE----- # # iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmhxEcoZHHBldGVyLm1h # eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3j5yEACWYnNeqo8Yph6/EJExE6eV # r0tC6FBb5ShPgA6kDxhpOc1lI6uXGh8+D7bL9BePEdz/brCf1QDfs2Z4q/hb5ysX # D0H6VI5Gr1j6MjkFRBo3+vvYz4Yh++XLn5Q9lZv8zaSEdraq/ay2kxnuhRCK+4Ar # +QoGtKrGMJ7UCpfiRlvNnd1UjgORZf10EE/bRImX13sxeDomP3CZhFzAyJyShOP9 # JA7bAd4rYJ4oj8R33y8Yaxjwm4FOndj740B0zwpO8mpjzFiE5zbqsaO+mEgYSflc # OQisCu/KRFpyIR+UqP+4gNaJLfKQW5Y4r61zEaiJWV/c4RdKNnbK1f7MX11fNhOk # k1paF3GIXp6f794Hb14vtsYnKHF2eeNSmRkAomXxLgUSYzLezL+yj7cdYmRJhgYU # thc1PSiEmHYhjRmOaMC9+dkMtvIexWyDNYNFTygoOE5/kTMSazeTFQpFmw+ZuTee # 9pjKsYRZJgTa64IkJy1L34jc2gds48Q20KpQsqZ22KQcjwt4PW4eQXkvMylawSut # mArHVH6AAxIK+defeEmnQCJ0OccyGCENjRDuWyWMMGoP/ggZpO47rGWmCUOK8xz8 # IfGdPeF/9xsKSKWvjpiHyyKa48wuO2bVC+5bISS6IPA2uGneS2DpmjkHU+gHBqpk # GNlvEnXZfavZOHejE7/L/Q== # =hJ4/ # -----END PGP SIGNATURE----- # gpg: Signature made Fri 11 Jul 2025 09:29:46 EDT # gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE # gpg: issuer "peter.maydell@linaro.org" # gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [full] # gpg: aka "Peter Maydell <pmaydell@gmail.com>" [full] # gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [full] # gpg: aka "Peter Maydell <peter@archaic.org.uk>" [unknown] # Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE * tag 'pull-target-arm-20250711' of https://gitlab.com/pm215/qemu: (36 commits) tests/functional: Add a test for the MAX78000 arm machine docs/system: arm: Add max78000 board description target/arm: Remove helper_sme2_luti4_4b hw/arm/virt-acpi-build: Don't create ITS id mappings by default system/qdev: Remove pointless NULL check in qdev_device_add_from_qdict hw/arm/virt: Allow virt extensions with KVM hw/arm/arm_gicv3_kvm: Add a migration blocker with kvm nested virt target/arm: Enable feature ARM_FEATURE_EL2 if EL2 is supported target/arm/kvm: Add helper to detect EL2 when using KVM hw/arm: Allow setting KVM vGIC maintenance IRQ hw/arm/fsl-imx8mp: Wire VIRQ and VFIQ target/arm: Don't enforce NSE,NS check for EL3->EL3 returns target/arm: Split out performance monitor regs to cpregs-pmu.c target/arm: Split out AT insns to tcg/cpregs-at.c target/arm: Drop stub for define_tlb_insn_regs arm/kvm: shorten one overly long line arm/cpu: store clidr into the idregs array arm/cpu: fix trailing ',' for SET_IDREG arm/cpu: store id_aa64afr{0,1} into the idregs array arm/cpu: store id_afr0 into the idregs array ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2025-07-13 01:45:18 -04:00
Stefan Hajnoczi	3adbf0bb8a	* s390x: Allow to select different entries when booting via pxelinux.cfg * Link s390-ccw.img statically * Fix broken bamboo functional test * s390x code cleanups and refactorings -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmhw2i0RHHRodXRoQHJl ZGhhdC5jb20ACgkQLtnXdP5wLbUGtA//XVr5t2/iH+zFdaHHFglMtYkqwyYspa/O zGPgcIZptQrzlbR+GFJwd4ae1HWb60E1YDyC7M1iWGQXeMNrDgeJJjUQfhB7693Y CPT1FCWaqXdrTHQJhf5+EGJZopwY1K4EHs+bMxCpU3ManD+MKuXzCgOMzZATnPUZ EcvOrzDBfEFEzQn5COUi5FF5Ds4DpOqQY1g1tpG92hQwWeAgdPPXSYlakG64Hm8C Km6BzAcylrRiHdORk3GeMJ1cPQ3vCjMrjTd87ra/xuH+DvPeyZ31cRIWIP1dn44x eog5dWo7pNmwfU50c4w/6dTSqwHG/bD/2ZPJH2nnJDLK02WeguantPN43fdoPU0c NEMldVE5GAqEr7Sbd5YIw9lBqrROIDfeUAxje4VZa1gSY4N/GYMGEZaM5vqYJJTP 0ndWP83QdamWuE0eOYMA+4oZiPpW79+Igv/PV13lsm9JgvO0WQisPFxE0cZqMTQp +wgbQ69rpyMiQxpusiL/6LA3khDyC8Z8g7cmjBfpqgwmVAZp7ly+GLk+ctG0zsjE hB99hkujZVkBZQLnVs0C/pXn1NdJ0wEupiHOSsVlQtqzNHlbweRJoxuGSp4Rl0Et 0DnTr3YHB6bdvRazaKzlkBHLLAXKEw0/xaRWGbE4tftZIrkOEeE0LMLLaLWLNKhX rqRoxq00OPs= =SOH3 -----END PGP SIGNATURE----- Merge tag 'pull-request-2025-07-11' of https://gitlab.com/thuth/qemu into staging * s390x: Allow to select different entries when booting via pxelinux.cfg * Link s390-ccw.img statically * Fix broken bamboo functional test * s390x code cleanups and refactorings # -----BEGIN PGP SIGNATURE----- # # iQJFBAABCgAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmhw2i0RHHRodXRoQHJl # ZGhhdC5jb20ACgkQLtnXdP5wLbUGtA//XVr5t2/iH+zFdaHHFglMtYkqwyYspa/O # zGPgcIZptQrzlbR+GFJwd4ae1HWb60E1YDyC7M1iWGQXeMNrDgeJJjUQfhB7693Y # CPT1FCWaqXdrTHQJhf5+EGJZopwY1K4EHs+bMxCpU3ManD+MKuXzCgOMzZATnPUZ # EcvOrzDBfEFEzQn5COUi5FF5Ds4DpOqQY1g1tpG92hQwWeAgdPPXSYlakG64Hm8C # Km6BzAcylrRiHdORk3GeMJ1cPQ3vCjMrjTd87ra/xuH+DvPeyZ31cRIWIP1dn44x # eog5dWo7pNmwfU50c4w/6dTSqwHG/bD/2ZPJH2nnJDLK02WeguantPN43fdoPU0c # NEMldVE5GAqEr7Sbd5YIw9lBqrROIDfeUAxje4VZa1gSY4N/GYMGEZaM5vqYJJTP # 0ndWP83QdamWuE0eOYMA+4oZiPpW79+Igv/PV13lsm9JgvO0WQisPFxE0cZqMTQp # +wgbQ69rpyMiQxpusiL/6LA3khDyC8Z8g7cmjBfpqgwmVAZp7ly+GLk+ctG0zsjE # hB99hkujZVkBZQLnVs0C/pXn1NdJ0wEupiHOSsVlQtqzNHlbweRJoxuGSp4Rl0Et # 0DnTr3YHB6bdvRazaKzlkBHLLAXKEw0/xaRWGbE4tftZIrkOEeE0LMLLaLWLNKhX # rqRoxq00OPs= # =SOH3 # -----END PGP SIGNATURE----- # gpg: Signature made Fri 11 Jul 2025 05:32:29 EDT # gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5 # gpg: issuer "thuth@redhat.com" # gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full] # gpg: aka "Thomas Huth <thuth@redhat.com>" [full] # gpg: aka "Thomas Huth <huth@tuxfamily.org>" [full] # gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown] # Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5 * tag 'pull-request-2025-07-11' of https://gitlab.com/thuth/qemu: target/s390x: Have s390_cpu_halt() not return anything target/s390x: Expose s390_count_running_cpus() method target/s390x: Remove unused s390_cpu_[un]halt() user stubs tests/functional/test_ppc_bamboo: Replace broken link with working assets tests/functional: Add dependency to the keymap_targets pc-bios: Update the s390 bios images with the pxelinux.cfg loadparm changes pc-bios/s390-ccw: link statically tests/functional: Add a test for s390x pxelinux.cfg network booting pc-bios/s390-ccw: Add a boot menu for booting via pxelinux.cfg pc-bios/s390-ccw: Make get_boot_index() from menu.c global pc-bios/s390-ccw: Allow up to 31 entries for pxelinux.cfg pc-bios/s390-ccw: Allow to select a different pxelinux.cfg entry via loadparm hw/s390x/s390-pci-bus.c: Use g_assert_not_reached() in functions taking an ett target/s390x/tcg: Use vaddr in s390_probe_access() target/s390x/kvm: Use vaddr in find/insert_hw_breakpoint() Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2025-07-13 01:44:51 -04:00
Stefan Hajnoczi	43ec52b4c8	loongarch queue -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQNhkKjomWfgLCz0aQfewwSUazn0QUCaHCzhAAKCRAfewwSUazn 0egkAP0eJcYWSaG1xH6Gevx/hGYthFhJrQ2IwMlTDHQsx8PAtQEArnm+nQ3+ckzN 5ZHx7GR+hFTAy0WJSSndnLttYC1zsws= =kcDz -----END PGP SIGNATURE----- Merge tag 'pull-loongarch-20250711' of https://github.com/bibo-mao/qemu into staging loongarch queue # -----BEGIN PGP SIGNATURE----- # # iHUEABYKAB0WIQQNhkKjomWfgLCz0aQfewwSUazn0QUCaHCzhAAKCRAfewwSUazn # 0egkAP0eJcYWSaG1xH6Gevx/hGYthFhJrQ2IwMlTDHQsx8PAtQEArnm+nQ3+ckzN # 5ZHx7GR+hFTAy0WJSSndnLttYC1zsws= # =kcDz # -----END PGP SIGNATURE----- # gpg: Signature made Fri 11 Jul 2025 02:47:32 EDT # gpg: using EDDSA key 0D8642A3A2659F80B0B3D1A41F7B0C1251ACE7D1 # gpg: Good signature from "bibo mao <maobibo@loongson.cn>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 7044 3A00 19C0 E97A 31C7 13C4 8E86 8FB7 A176 9D4C # Subkey fingerprint: 0D86 42A3 A265 9F80 B0B3 D1A4 1F7B 0C12 51AC E7D1 * tag 'pull-loongarch-20250711' of https://github.com/bibo-mao/qemu: target/loongarch: Remove unnecessary page size validity checking target/loongarch: Fix CSR STLBPS register write emulation target/loongarch: Correct spelling in helper_csrwr_pwcl() hw/intc/loongarch_extioi: Move unrealize function to common code Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2025-07-13 01:44:30 -04:00
Peter Maydell	d6390204c6	linux-user: Use qemu_set_cloexec() to mark pidfd as FD_CLOEXEC In the linux-user do_fork() function we try to set the FD_CLOEXEC flag on a pidfd like this: fcntl(pid_fd, F_SETFD, fcntl(pid_fd, F_GETFL) \| FD_CLOEXEC); This has two problems: (1) it doesn't check errors, which Coverity complains about (2) we use F_GETFL when we mean F_GETFD Deal with both of these problems by using qemu_set_cloexec() instead. That function will assert() if the fcntls fail, which is fine (we are inside fork_start()/fork_end() so we know nothing can mess around with our file descriptors here, and we just got this one from pidfd_open()). (As we are touching the if() statement here, we correct the indentation.) Coverity: CID 1508111 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20250711141217.1429412-1-peter.maydell@linaro.org>	2025-07-11 10:45:14 -06:00
Richard Henderson	c86da2b1dd	tcg: Use uintptr_t in tcg_malloc implementation Avoid ubsan failure with clang-20, tcg.h:715:19: runtime error: applying non-zero offset 64 to null pointer by not using pointers. Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2025-07-11 10:43:47 -06:00
Juraj Marcin	beeac2df5f	migration: Rename save_live_complete_precopy_thread to save_complete_precopy_thread Recent patch [1] renames the save_live_complete_precopy handler to save_complete, as the machine is not live in most cases when this handler is executed. The same is true also for save_live_complete_precopy_thread, therefore this patch removes the "live" keyword from the handler itself and related types to keep the naming unified. In contrast to save_complete, this handler is only executed at the end of precopy, therefore the "precopy" keyword is retained. [1]: https://lore.kernel.org/all/20250613140801.474264-7-peterx@redhat.com/ Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cédric Le Goater <clg@redhat.com> Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250626085235.294690-1-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:39 -03:00
Peter Xu	3345fb3b6d	migration/postcopy: Add latency distribution report for blocktime Add the latency distribution too for blocktime, using order-of-two buckets. It accounts for all the faults, from either vCPU or non-vCPU threads. With prior rework, it's very easy to achieve by adding an array to account for faults in each buckets. Sample output for HMP (while for QMP it's simply an array): Postcopy Latency Distribution: [ 1 us - 2 us ]: 0 [ 2 us - 4 us ]: 0 [ 4 us - 8 us ]: 1 [ 8 us - 16 us ]: 2 [ 16 us - 32 us ]: 2 [ 32 us - 64 us ]: 3 [ 64 us - 128 us ]: 10169 [ 128 us - 256 us ]: 50151 [ 256 us - 512 us ]: 12876 [ 512 us - 1 ms ]: 97 [ 1 ms - 2 ms ]: 42 [ 2 ms - 4 ms ]: 44 [ 4 ms - 8 ms ]: 93 [ 8 ms - 16 ms ]: 138 [ 16 ms - 32 ms ]: 0 [ 32 ms - 65 ms ]: 0 [ 65 ms - 131 ms ]: 0 [ 131 ms - 262 ms ]: 0 [ 262 ms - 524 ms ]: 0 [ 524 ms - 1 sec ]: 0 [ 1 sec - 2 sec ]: 0 [ 2 sec - 4 sec ]: 0 [ 4 sec - 8 sec ]: 0 [ 8 sec - 16 sec ]: 0 Cc: Markus Armbruster <armbru@redhat.com> Acked-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-15-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:39 -03:00
Peter Xu	ed23a15976	migration/postcopy: blocktime allows track / report non-vCPU faults When used to report page fault latencies, the blocktime feature can be almost useless when KVM async page fault is enabled, because in most cases such remote fault will kickoff async page faults, then it's not trackable from blocktime layer. After all these recent rewrites to blocktime layer, it's finally so easy to also support tracking non-vCPU faults. It'll be even faster if we could always index fault records with TIDs, unfortunately we need to maintain the blocktime API which report things in vCPU indexes. Of course this can work not only for kworkers, but also any guest accesses that may reach a missing page, for example, very likely when in the QEMU main thread too (and all other threads whenever applicable). In this case, we don't care about "how long the threads are blocked", but we only care about "how long the fault will be resolved". Cc: Markus Armbruster <armbru@redhat.com> Cc: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Tested-by: Mario Casquero <mcasquer@redhat.com> Link: https://lore.kernel.org/r/20250613141217.474825-14-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:39 -03:00
Peter Xu	b63a2e9e4b	migration/postcopy: Optimize blocktime fault tracking with hashtable Currently, the postcopy blocktime feature maintains vCPU fault information using an array (vcpu_addr[]). It has two issues. Issue 1: Performance Concern ============================ The old algorithm was almost OK and fast on inserts, except that the lookup is slow and won't scale if there are a lot of vCPUs: when a page is copied during postcopy, mark_postcopy_blocktime_end() will walk the whole array trying to find which vCPUs are blocked by the address. So it needs constant O(N) walk for each page resolution. Alexey (the author of postcopy blocktime) mentioned the perf issue and how to optimize it in a piece of comment in the page resolution path. The comment was (interestingly..) not complete, but it's relatively clear what he wanted to say about this perf issue. Issue 2: Wrong Accounting on re-entrancies ========================================== People might think that each vCPU should only and always get one fault at a time, so that when the blocktime layer captured one fault on one vCPU, we should never see another fault message on this vCPU. It's almost correct, except in some extreme rare cases. Case 1: it's possible the fault thread processes the userfaultfd messages too fast so it can see >1 messages on one vCPU before the previous one was resolved. Case 2: it's theoretically also possible one vCPU can get even more than one message on the same fault address if a fault is retried by the kernel (e.g., handle_userfault() got interrupted before page resolution). As this info might be important, instead of using commit message, I put more details into the code as comment, when introducing an array maintaining concurrent faults on one vCPU. Please refer to the comments for details on both cases, especially case 1 which can be tricky. Case 1 sounds rare, but it can be easily reproduced locally for me when we run blocktime together with the migration-test on the vanilla postcopy. New Design ========== This patch should do almost what Alexey mentioned, but slightly differently: instead of having an array to maintain vCPU fault addresses, for each of the fault message we push a message into a hash, indexed by the fault address. With the hash, it can replace the old two structs: both the vcpu_addr[] array, and also the array to store the start time of the fault. However due to above we need one more counter array to account concurrent faults on the same vCPU - that should even be needed in the old code, it's just that the old code was buggy and it will blindly overwrite an existing entry.. now we'll start to really track everything. The hash structure might be more efficient than tree to maintain such addr->(cpu, fault_time) information, so that the insert() and lookup() paths should ideally both be ~O(1). After all, we do not need to sort. Here we need to do one remove() though after the lookup(). It could be slow but only if many vCPUs faulted exactly on the same address (so when the list of cpu entries is long), which should be unlikely. Even with that, it's still a worst case O(N) (consider 400 vCPUs faulted on the same address and how likely is it..) rather than a constant O(N) complexity. When at it, touch up the tracepoints to make them slightly more useful. One tracepoint is added when walking all the fault entries. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-13-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:38 -03:00
Peter Xu	4c8a119485	migration/postcopy: Cleanup the total blocktime accounting The variable vcpu_total_blocktime isn't easy to follow. In reality, it wants to capture the case where all vCPUs are stopped, and now there will be some vCPUs starts running. The name now starts to conflict with vcpu_blocktime_total[], meanwhile it's actually not necessary to have the variable at all: since nobody is touching smp_cpus_down except ourselves, we can safely do the calculation at the end before decrementing smp_cpus_down. Hopefully this makes the logic easier to read, side benefit is we drop one temp var. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-12-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:38 -03:00
Peter Xu	28a185204e	migration/postcopy: Cache the tid->vcpu mapping for blocktime Looking up the vCPU index for each fault can be expensive when there're hundreds of vCPUs. Provide a cache for tid->vcpu instead with a hash table, then lookup from there. When at it, add another counter to record how many non-vCPU faults it gets. For example, the main thread can also access a guest page that was missing. These kind of faults are not accounted by blocktime so far. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-11-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:38 -03:00
Peter Xu	f07f2a3092	migration/postcopy: Initialize blocktime context only until listen Before this patch, the blocktime context can be created very early, because postcopy_ram_supported_by_host() <- migrate_caps_check() can happen during migration object init. The trick here is the blocktime context needs system vCPU information, which seems to be possible to change after that point. I didn't verify it, but it doesn't sound right. Now move it out and initialize the context only when postcopy listen starts. That is already during a migration so it should be guaranteed the vCPU topology can never change on both sides. While at it, assert that the ctx isn't created instead this time; the old "if" trick isn't needed when we're sure it will only happen once now. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-10-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:38 -03:00
Peter Xu	b4c82b4288	migration/postcopy: Report fault latencies in blocktime Blocktime so far only cares about the time one vcpu (or the whole system) got blocked. It would be also be helpful if it can also report the latency of page requests, which could be very sensitive during postcopy. Blocktime itself is sometimes not very important, especially when one thinks about KVM async PF support, which means vCPUs are literally almost not blocked at all because the guest OS is smart enough to switch to another task when a remote fault is needed. However, latency is still sensitive and important because even if the guest vCPU is running on threads that do not need a remote fault, the workload that accesses some missing page is still affected. Add two entries to the report, showing how long it takes to resolve a remote fault. Mention in the QAPI doc that this is not the real average fault latency, but only the ones that was requested for a remote fault. Unwrap get_vcpu_blocktime_list() so we don't need to walk the list twice, meanwhile add the entry checks in qtests for all postcopy tests. Cc: Markus Armbruster <armbru@redhat.com> Cc: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Tested-by: Mario Casquero <mcasquer@redhat.com> Link: https://lore.kernel.org/r/20250613141217.474825-9-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:38 -03:00
Peter Xu	271a1940e9	migration/postcopy: Add blocktime fault counts per-vcpu Add a field to count how many remote faults one vCPU has taken. So far it's still not used, but will be soon. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-8-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:38 -03:00
Peter Xu	a098761f63	migration/postcopy: Bring blocktime layer to ns level With 64-bit fields, it is trivial. The caution is when exposing any values in QMP, it was still declared with milliseconds (ms). Hence it's needed to do the convertion when exporting the values to existing QMP queries. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-7-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:38 -03:00
Peter Xu	08fb2a9335	migration/postcopy: Drop PostcopyBlocktimeContext.start_time Now with 64bits, the offseting using start_time is not needed anymore, because the array can always remember the whole timestamp. Then drop the unused parameter in get_low_time_offset() altogether. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:37 -03:00
Peter Xu	b2819530e3	migration/postcopy: Make all blocktime vars 64bits I am guessing it was used to be 32bits because of the atomic ops. Now all the atomic ops are gone and we're protected by a mutex instead, it's ok we can switch to 64 bits. Reasons to move over: - Allow further patches to change the unit from ms to us: with postcopy preempt mode, we're really into hundreds of microseconds level on blocktime. We'd better be able to trap those. - This also paves way for some other tricks that the original version used to avoid overflows, e.g., start_time was almost only useful before to make sure the sampled timestamp won't overflow a 32-bit field. - This prepares further reports on top of existing data collected, e.g. average page fault latencies. When average operation is taken into account, milliseconds are simply too coarse grained. When at it: - Rename page_fault_vcpu_time to vcpu_blocktime_start. - Rename vcpu_blocktime to vcpu_blocktime_total. - Touch up the trace-events to not dump blocktime ctx pointer Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:37 -03:00
Peter Xu	c0f47dfb5b	migration/postcopy: Drop all atomic ops in blocktime feature Now with the mutex protection it's not needed anymore. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:37 -03:00

1 2 3 4 5 ...

122508 commits