qemu-cr16

Author	SHA1	Message	Date
Markus Armbruster	ffaa1b50a8	migration: Use warn_reportf_err() where appropriate Replace warn_report("...: %s", ..., error_get_pretty(err)); by warn_reportf_err(err, "...: ", ...); Prior art: commit `5217f1887a` (error: Use error_reportf_err() where appropriate). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20251115083500.2753895-3-armbru@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-21 10:33:21 -05:00
Markus Armbruster	93817ec396	migration: Plug memory leaks after migrate_set_error() migrate_set_error(s, err) stores a copy of @err in @s. The original @err is not freed. Most callers free it immediately. Some callers free it later, or pass it on. And some leak it. Fix those. Perhaps migrate_set_error(s, err) should take ownership of @err. The callers that free it immediately would become simpler, and avoid a copy and a deallocation. The others would have to pass error_copy(err). Signed-off-by: Markus Armbruster <armbru@redhat.com> Link: https://lore.kernel.org/r/20251115083500.2753895-2-armbru@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-21 10:33:21 -05:00
Matthew Rosato	911bdd34ca	migration: set correct list pointer when removing notifier In migration_remove_notifier(), g_slist_remove() will search for and potentially remove an entry from the specified list. The return value should be used to update the potentially-changed head pointer of the list that was just searched (migration_state_notifiers[mode]) instead of the migration blockers list. Fixes: `dc79c7d5e1` ("migration: multi-mode notifier") Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20251113213545.513453-1-mjrosato@linux.ibm.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-21 10:33:21 -05:00
Li Zhijian	0b5bf4ea76	migration: Fix transition to COLO state from precopy Commit `4881411136` ("migration: Always set DEVICE state") set a new DEVICE state before completed during migration, which broke the original transition to COLO. The migration flow for precopy has changed to: active -> pre-switchover -> device -> completed. This patch updates the transition state to ensure that the Pre-COLO state corresponds to DEVICE state correctly. Cc: qemu-stable <qemu-stable@nongnu.org> Fixes: `4881411136` ("migration: Always set DEVICE state") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Zhang Chen <zhangckid@gmail.com> Tested-by: Zhang Chen <zhangckid@gmail.com> Link: https://lore.kernel.org/r/20251104013606.1937764-1-lizhijian@fujitsu.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-21 10:33:21 -05:00
Philippe Mathieu-Daudé	0456a977af	migration/rdma: Check ntohll() availability with meson Commit `44ce1b5d2f` ("migration/rdma: define htonll/ntohll only if not predefined") tried to only include htonll/ntohll replacements when their symbol is defined, but this doesn't work, as they aren't: ../migration/rdma.c:242:17: error: static declaration of 'htonll' follows non-static declaration 242 \| static uint64_t htonll(uint64_t v) \| ^~~~~~ In file included from /usr/include/netinet/in.h:73, from /usr/include/sys/socket.h:32, from /home/f4bug/qemu/include/system/os-posix.h:30, from /home/f4bug/qemu/include/qemu/osdep.h:176, from ../migration/rdma.c:17: /usr/include/sys/byteorder.h:75:18: note: previous declaration of 'htonll' with type 'uint64_t(uint64_t)' {aka 'long unsigned int(long unsigned int)'} 75 \| extern uint64_t htonll(uint64_t); \| ^~~~~~ ../migration/rdma.c:252:17: error: static declaration of 'ntohll' follows non-static declaration 252 \| static uint64_t ntohll(uint64_t v) \| ^~~~~~ /usr/include/sys/byteorder.h:76:18: note: previous declaration of 'ntohll' with type 'uint64_t(uint64_t)' {aka 'long unsigned int(long unsigned int)'} 76 \| extern uint64_t ntohll(uint64_t); \| ^~~~~~ Better to check the symbol availability with meson. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20251117203834.83713-3-philmd@linaro.org>	2025-11-18 19:59:36 +01:00
Eric Blake	ec59a65a4d	qio: Provide accessor around QIONetListener->sioc An upcoming patch needs to pass more than just sioc as the opaque pointer to an AioContext; but since our AioContext code in general (and its QIO Channel wrapper code) lacks a notify callback present with GSource, we do not have the trivial option of just g_malloc'ing a small struct to hold all that data coupled with a notify of g_free. Instead, the data pointer must outlive the registered handler; in fact, having the data pointer have the same lifetime as QIONetListener is adequate. But the cleanest way to stick such a helper struct in QIONetListener will be to rearrange internal struct members. And that in turn means that all existing code that currently directly accesses listener->nsioc and listener->sioc[] should instead go through accessor functions, to be immune to the upcoming struct layout changes. So this patch adds accessor methods qio_net_listener_nsioc() and qio_net_listener_sioc(), and puts them to use. While at it, notice that the pattern of grabbing an sioc from the listener only to turn around can call qio_channel_socket_get_local_address is common enough to also warrant the helper of qio_net_listener_get_local_address, and fix a copy-paste error in the corresponding documentation. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20251113011625.878876-24-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2025-11-13 10:54:44 -06:00
Juraj Marcin	7b842fe354	migration: Introduce POSTCOPY_DEVICE state Currently, when postcopy starts, the source VM starts switchover and sends a package containing the state of all non-postcopiable devices. When the destination loads this package, the switchover is complete and the destination VM starts. However, if the device state load fails or the destination side crashes, the source side is already in POSTCOPY_ACTIVE state and cannot be recovered, even when it has the most up-to-date machine state as the destination has not yet started. This patch introduces a new POSTCOPY_DEVICE state which is active while the destination machine is loading the device state, is not yet running, and the source side can be resumed in case of a migration failure. Return-path is required for this state to function, otherwise it will be skipped in favor of POSTCOPY_ACTIVE. To transition from POSTCOPY_DEVICE to POSTCOPY_ACTIVE, the source side uses a PONG message that is a response to a PING message processed just before the POSTCOPY_RUN command that starts the destination VM. Thus, this feature is effective even if the destination side does not yet support this new state. Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20251103183301.3840862-9-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Juraj Marcin	0680dd185b	migration: Make postcopy listen thread joinable This patch makes the listen thread joinable instead detached, and joins it alongside other postcopy threads. Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20251103183301.3840862-8-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Juraj Marcin	b1a17a519b	migration: Respect exit-on-error when migration fails before resuming When exit-on-error was added to migration, it wasn't added to postcopy. Even though postcopy migration will usually pause and not fail, in cases it does unrecoverably fail before destination side has been started, exit-on-error will allow management to query the error. Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20251103183301.3840862-7-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Juraj Marcin	468a7effde	migration: Refactor all incoming cleanup info migration_incoming_destroy() Currently, there are two functions that are responsible for calling the cleanup of the incoming migration state. With successful precopy, it's the incoming migration coroutine, and with successful postcopy it's the postcopy listen thread. However, if postcopy fails during in the device load, both functions will try to do the cleanup. This patch refactors all cleanup that needs to be done on the incoming side into a common function and defines a clear boundary, who is responsible for the cleanup. The incoming migration coroutine is responsible for calling the cleanup function, unless the listen thread has been started, in which case the postcopy listen thread runs the incoming migration cleanup in its BH. Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Fixes: `9535435795` ("migration: push Error **errp into qemu_loadvm_state()") Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20251103183301.3840862-6-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Juraj Marcin	67474ebacc	migration: Introduce postcopy incoming setup and cleanup functions After moving postcopy_ram_listen_thread() to postcopy file, this patch introduces a pair of functions, postcopy_incoming_setup() and postcopy_incoming_cleanup(). These functions encapsulate setup and cleanup of all incoming postcopy resources, postcopy-ram and postcopy listen thread. Furthermore, this patch also renames the postcopy_ram_listen_thread to postcopy_listen_thread, as this thread handles not only postcopy-ram, but also dirty-bitmaps and in the future it could handle other postcopiable devices. Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20251103183301.3840862-5-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Juraj Marcin	c9aac1ae10	migration: Move postcopy_ram_listen_thread() to postcopy-ram.c This patch addresses a TODO about moving postcopy_ram_listen_thread() to postcopy file. Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20251103183301.3840862-4-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Peter Xu	26f65c01ed	migration: Do not try to start VM if disk activation fails If a rare split brain happens (e.g. dest QEMU started running somehow, taking shared drive locks), src QEMU may not be able to activate the drives anymore. In this case, src QEMU shouldn't start the VM or it might crash the block layer later with something like: Meanwhile, src QEMU cannot try to continue either even if dest QEMU can release the drive locks (e.g. by QMP "stop"). Because as long as dest QEMU started running, it means dest QEMU's RAM is the only version that is consistent with current status of the shared storage. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20251103183301.3840862-3-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Juraj Marcin	1529ec8f5f	migration: Flush migration channel after sending data of CMD_PACKAGED If the length of the data sent after CMD_PACKAGED is just right, and there is not much data to send afterward, it is possible part of the CMD_PACKAGED payload will get left behind in the sending buffer. This causes the destination side to hang while it tries to load the whole package and initiate postcopy. Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20251103183301.3840862-2-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Vladimir Sementsov-Ogievskiy	8c3843638c	migration: vmsd errp handlers: return bool No code actually depend on specific errno values returned by vmstate_load_state. The only use of it is to check for success, and sometimes inject numeric error values into error messages in migration code. The latter is not a stopper for gradual conversion to "errp + bool return value" APIs. Big analysis of vmstate_load_state() callers, showing that specific errno values are not actually used, is done by Peter here: https://lore.kernel.org/qemu-devel/aQDdRn8t0B8oE3gf@x1.local/ Converting of vmstate_load_state() itself will follow in another series. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Link: https://lore.kernel.org/r/20251028170926.77219-2-vsementsov@yandex-team.ru Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Vladimir Sementsov-Ogievskiy	507685984c	migration/vmstate: stop reporting error number for new _errp APIs The handlers .pre_load_errp, .post_load_errp and .pre_save_errp should put all needed information into errp, we should not append error number here. Note, that there are some more error messages with numeric error codes in this file. We leave them for another day, our current goal is to prepare for the following commit, which will update interface of _errp() APIs. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Link: https://lore.kernel.org/r/20251028170926.77219-1-vsementsov@yandex-team.ru Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Vladimir Sementsov-Ogievskiy	d4b3a3cc55	migration: vmstate_save_state_v(): fix error path In case of pre_save_errp, on error, we continue processing fields, unlike case of pre_save, where we return immediately. Behavior for pre_save_errp case is wrong, we must return here, like for pre_save. "migration: Add error-parameterized function variants in VMSD struct" Fixes: `40de712a89` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Link: https://lore.kernel.org/r/20251028130738.29037-2-vsementsov@yandex-team.ru Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Peter Xu	604bb1badc	migration: Properly wait on G_IO_IN when peeking messages migration_channel_read_peek() used to do explicit waits of a short period when peeking message needs retry. Replace it with explicit polls on the io channel, exactly like what qemu_fill_buffer() does. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Link: https://lore.kernel.org/r/20251022192612.2737648-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Peter Xu	1edf0df284	io: Add qio_channel_wait_cond() helper Add the helper to wait for QIO channel's IO availability in any context (coroutine, or non-coroutine). Use it tree-wide for three occurences. Cc: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Link: https://lore.kernel.org/r/20251022192612.2737648-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Markus Armbruster	5777e20718	migration: Put Error *errp parameter last qapi/error.h's big comment: - Functions that use Error to report errors have an Error *errp parameter. It should be the last parameter, except for functions * taking variable arguments. is_only_migratable() and add_blockers() have it in the middle. Clean them up. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20251027064503.1074255-4-armbru@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Markus Armbruster	3ca0a0ab05	migration: Use bitset of MigMode instead of variable arguments migrate_add_blocker_modes() and migration_add_notifier_modes use variable arguments for a set of migration modes. The variable arguments get collected into a bitset for processsing. Take a bitset argument instead, it's simpler. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20251027064503.1074255-3-armbru@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Markus Armbruster	75a9f080c2	migration: Use unsigned instead of int for bit set of MigMode Signed operands in bitwise operations are unwise. I believe they're safe here, but avoiding them is easy, so let's do that. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20251027064503.1074255-2-armbru@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Peter Xu	c2a06e8f28	migration/cpr: Document obscure usage of g_autofree when parse str HMP parsing of cpr_exec_command contains an obscure usage of g_autofree. Provide a document for it to be clear that it's intentional, rather than memory leaked. Cc: Dr. David Alan Gilbert <dave@treblig.org> Reported-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Dr. David Alan Gilbert <dave@treblig.org> Link: https://lore.kernel.org/r/20251023161657.2821652-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Peter Xu	ded3cf4aaf	migration/cpr: Avoid crashing QEMU when cpr-exec runs with no args If an user invokes cpr-exec without setting the exec args first, currently it'll crash QEMU. Avoid it, instead fail the QMP migrate command. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20251021220407.2662288-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Peter Xu	89471ef237	migration/cpr: Fix UAF in cpr_exec_cb() when execvp() fails Per reported and analyzed by Peter: https://lore.kernel.org/r/CAFEAcA82ih8RVCm-u1oxiS0V2K4rV4jMzNb13pAV=e2ivmiDRA@mail.gmail.com Fix the issue by moving the error_setg_errno() earlier. When at it, clear argv variable after freed. Resolves: Coverity CID 1641397 Fixes: `a3eae205c6` ("migration: cpr-exec mode") Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20251021220407.2662288-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:09 -05:00
Peter Xu	6a65fdee8a	migration/cpr: Fix coverity report in cpr_exec_persist_state() Per reported and analyzed by Peter: https://lore.kernel.org/r/CAFEAcA_mUQ2NeoguR5efrhw7XYGofnriWEA=+Dg+Ocvyam1wAw@mail.gmail.com mfd leak is a false positive, try to use a coverity annotation (which I didn't find manual myself, but still give it a shot). Fix the other one by capture error if setenv() failed. When at it, pass the error to the top (cpr_state_save()). Along the way, changing all retval to bool when errp is around. Resolves: Coverity CID 1641391 Resolves: Coverity CID 1641392 Fixes: `efc6587313` ("migration: cpr-exec save and load") Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20251021220407.2662288-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:09 -05:00
Peter Xu	6f190736bf	migration: Fix error leak in postcopy_ram_listen_thread() As reported and analyzed by Peter: https://lore.kernel.org/r/CAFEAcA9otBWtR7rPQ0Y9aBm+7ZWJzd4VWpXrAmGr8XspPn+zpw@mail.gmail.com Fix it by freeing the error. When at it, always reset the local_err pointer in both paths. Cc: Arun Menon <armenon@redhat.com> Resolves: Coverity CID 1641390 Fixes: `94272d9b45` ("migration: Capture error in postcopy_ram_listen_thread()") Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20251021220407.2662288-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:09 -05:00
Marco Cavenati	0ecd285824	migration: mapped-ram: handle zero pages Make mapped-ram compatible with loadvm snapshot restoring by explicitly zeroing memory pages in this case. Skip zeroing for -incoming and -loadvm migrations to preserve performance. Signed-off-by: Marco Cavenati <Marco.Cavenati@eurecom.fr> Link: https://lore.kernel.org/r/20251010115954.1995298-3-Marco.Cavenati@eurecom.fr Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:09 -05:00
Marco Cavenati	04a191cb36	migration: add FEATURE_SEEKABLE to QIOChannelBlock Enable the use of the mapped-ram migration feature with savevm/loadvm snapshots by adding the QIO_CHANNEL_FEATURE_SEEKABLE feature to QIOChannelBlock. Implement io_preadv and io_pwritev methods to provide positioned I/O capabilities that don't modify the channel's position pointer. Signed-off-by: Marco Cavenati <Marco.Cavenati@eurecom.fr> Link: https://lore.kernel.org/r/20251010115954.1995298-2-Marco.Cavenati@eurecom.fr Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:09 -05:00
Marco Cavenati	e5423828d6	migration/ram: fix docs of ram_handle_zero Remove outdated 'ch' parameter from the function documentation. Signed-off-by: Marco Cavenati <Marco.Cavenati@eurecom.fr> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20251001161823.2032399-3-Marco.Cavenati@eurecom.fr Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:09 -05:00
Fabiano Rosas	112b55f0b0	migration/savevm: Add a compatibility check for capabilities It has always been possible to enable arbitrary migration capabilities and attempt to take a snapshot of the VM with the savevm/loadvm commands as well as their QMP counterparts snapshot-save/snapshot-load. Most migration capabilities are not meant to be used with snapshots and there's a risk of crashing QEMU or producing incorrect behavior. Ideally, every migration capability would either be implemented for savevm or explicitly rejected. Add a compatibility check routine and reject the snapshot command if an incompatible capability is enabled. For now only act on the the two that actually cause a crash: multifd and mapped-ram. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2881 Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20251007184213.5990-1-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:09 -05:00
Philippe Mathieu-Daudé	4db362f68c	system/physmem: Extract API out of 'system/ram_addr.h' header Very few files use the Physical Memory API. Declare its methods in their own header: "system/physmem.h". Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-Id: <20251001175448.18933-19-philmd@linaro.org>	2025-10-07 05:03:56 +02:00
Philippe Mathieu-Daudé	aa60bdb700	system/physmem: Drop 'cpu_' prefix in Physical Memory API The functions related to the Physical Memory API declared in "system/ram_addr.h" do not operate on vCPU. Remove the 'cpu_' prefix. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-Id: <20251001175448.18933-18-philmd@linaro.org>	2025-10-07 05:03:56 +02:00
Philippe Mathieu-Daudé	8bf3a88308	system/physmem: Reduce cpu_physical_memory_sync_dirty_bitmap() scope cpu_physical_memory_sync_dirty_bitmap() is now only called within system/physmem.c, by ramblock_sync_dirty_bitmap(). Reduce its scope by making it internal to this file. Since it doesn't involve any CPU, remove the 'cpu_' prefix. Remove the now unneeded "qemu/rcu.h" and "system/memory.h" headers. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20251001175448.18933-17-philmd@linaro.org>	2025-10-07 05:03:56 +02:00
Philippe Mathieu-Daudé	34f9b0ad08	system/ramblock: Move ram_block_is_pmem() declaration Move ramblock_is_pmem() along with the RAM Block API exposed by the "system/ramblock.h" header. Rename as ram_block_is_pmem() to keep API prefix consistency. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: Peter Xu <peterx@redhat.com> Message-Id: <20251002032812.26069-3-philmd@linaro.org>	2025-10-07 03:37:03 +02:00
Steve Sistare	a3eae205c6	migration: cpr-exec mode Add the cpr-exec migration mode. Usage: qemu-system-$arch -machine aux-ram-share=on ... migrate_set_parameter mode cpr-exec migrate_set_parameter cpr-exec-command \ <arg1> <arg2> ... -incoming <uri-1> \ migrate -d <uri-1> The migrate command stops the VM, saves state to uri-1, directly exec's a new version of QEMU on the same host, replacing the original process while retaining its PID, and loads state from uri-1. Guest RAM is preserved in place, albeit with new virtual addresses. The new QEMU process is started by exec'ing the command specified by the @cpr-exec-command parameter. The first word of the command is the binary, and the remaining words are its arguments. The command may be a direct invocation of new QEMU, or may be a non-QEMU command that exec's the new QEMU binary. This mode creates a second migration channel that is not visible to the user. At the start of migration, old QEMU saves CPR state to the second channel, and at the end of migration, it tells the main loop to call cpr_exec. New QEMU loads CPR state early, before objects are created. Because old QEMU terminates when new QEMU starts, one cannot stream data between the two, so uri-1 must be a type, such as a file, that accepts all data before old QEMU exits. Otherwise, old QEMU may quietly block writing to the channel. Memory-backend objects must have the share=on attribute, but memory-backend-epc is not supported. The VM must be started with the '-machine aux-ram-share=on' option, which allows anonymous memory to be transferred in place to the new process. The memfds are kept open across exec by clearing the close-on-exec flag, their values are saved in CPR state, and they are mmap'd in new QEMU. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Acked-by: Markus Armbruster <armbru@redhat.com> Link: https://lore.kernel.org/r/1759332851-370353-7-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Steve Sistare	efc6587313	migration: cpr-exec save and load To preserve CPR state across exec, create a QEMUFile based on a memfd, and keep the memfd open across exec. Save the value of the memfd in an environment variable so post-exec QEMU can find it. These new functions are called in a subsequent patch. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/r/1759332851-370353-6-git-send-email-steven.sistare@oracle.com [peterx: fix build for Windows] Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Steve Sistare	f57ff59f1e	migration: cpr-exec-command parameter Create the cpr-exec-command migration parameter, defined as a list of strings. It will be used for cpr-exec migration mode in a subsequent patch, and contains forward references to cpr-exec mode in the qapi doc. No functional change, except that cpr-exec-command is shown by the 'info migrate' command. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Acked-by: Markus Armbruster <armbru@redhat.com> Link: https://lore.kernel.org/r/1759332851-370353-5-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Steve Sistare	a9f9eee58b	migration: add cpr_walk_fd Add a helper to walk all CPR fd's and run a callback for each. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1759332851-370353-3-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Steve Sistare	dc79c7d5e1	migration: multi-mode notifier Allow a notifier to be added for multiple migration modes. To allow a notifier to appear on multiple per-node lists, use a generic list type. We can no longer use NotifierWithReturnList, because it shoe horns the notifier onto a single list. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/1759332851-370353-2-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Daniel P. Berrangé	a5bc1ccca9	migration: simplify error reporting after channel read The code handling the return value of qio_channel_read proceses len == 0 (EOF) separately from len < 1 (error), but in both cases ends up calling qemu_file_set_error_obj() with -EIO as the errno. This logic can be merged into one codepath to simplify it. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Prasad Pandit <pjp@fedoraproject.org> Link: https://lore.kernel.org/r/20250801170212.54409-2-berrange@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Juraj Marcin	725a9e5f78	migration: Fix state transition in postcopy_start() error handling Commit `4881411136` ("migration: Always set DEVICE state") introduced DEVICE state to postcopy, which moved the actual state transition that leads to POSTCOPY_ACTIVE. However, the error handling part of the postcopy_start() function still expects the state POSTCOPY_ACTIVE, but depending on where an error happens, now the state can be either ACTIVE, DEVICE or CANCELLING, but never POSTCOPY_ACTIVE, as this transition now happens just before a successful return from the function. Instead, accept any state except CANCELLING when transitioning to FAILED state. Cc: qemu-stable@nongnu.org Fixes: `4881411136` ("migration: Always set DEVICE state") Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250826115145.871272-1-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Peter Xu	82f038d596	migration/multifd/tls: Cleanup BYE message processing on sender side This patch is a trivial cleanup to the BYE messages on the multifd sender side. It could also be a fix, but since we do not have a solid clue, taking this as a cleanup only. One trivial concern is, migration_tls_channel_end() might be unsafe to be invoked in the migration thread if migration is not successful, because when failed / cancelled we do not know whether the multifd sender threads can be writting to the channels, while GnuTLS library (when it's a TLS channel) logically doesn't support concurrent writes. When at it, cleanup on a few things. What changed: - Introduce a helper to do graceful shutdowns with rich comment, hiding the details - Only send bye() iff migration succeeded, skip if it failed / cancelled - Detect TLS channel using channel type rather than thread created flags - Move the loop into the existing one that will close the channels, but do graceful shutdowns before channel shutdowns - local_err seems to have been leaked if set, fix it along the way Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250925201601.290546-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Bin Guo	2aae717122	migration: HMP: Adjust the order of output fields Adjust the positions of 'tls-authz' and 'max-postcopy-bandwidth' in the fields output by the 'info migrate_parameters' command so that related fields are next to each other. For clarity only, no functional changes. Sample output after this commit: (qemu) info migrate_parameters ... max-cpu-throttle: 99 tls-creds: '' tls-hostname: '' tls-authz: '' max-bandwidth: 134217728 bytes/second avail-switchover-bandwidth: 0 bytes/second max-postcopy-bandwidth: 0 bytes/second downtime-limit: 300 ms ... Cc: Dr. David Alan Gilbert <dave@treblig.org> Signed-off-by: Bin Guo <guobin@linux.alibaba.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20250929021213.28369-1-guobin@linux.alibaba.com [peterx: move postcopy-bw before avail-switchover-bw] Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Peter Xu	dc487044d5	migration: Make migration_has_failed() work even for CANCELLING No issue I hit, the change is only from code observation when I am looking at a TLS premature termination issue. We set CANCELLED very late, it means migration_has_failed() may not work correctly if it's invoked before updating CANCELLING to CANCELLED. Allow that state will make migration_has_failed() working as expected even if it's invoked slightly earlier. One current user is the multifd code for the TLS graceful termination, where it's before updating to CANCELLED. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250918203937.200833-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	40de712a89	migration: Add error-parameterized function variants in VMSD struct - We need to have good error reporting in the callbacks in VMStateDescription struct. Specifically pre_save, pre_load and post_load callbacks. - It is not possible to change these functions everywhere in one patch, therefore, we introduce a duplicate set of callbacks with Error object passed to them. - So, in this commit, we implement 'errp' variants of these callbacks, introducing an explicit Error object parameter. - This is a functional step towards transitioning the entire codebase to the new error-parameterized functions. - Deliberately called in mutual exclusion from their counterparts, to prevent conflicts during the transition. - New impls should preferentally use 'errp' variants of these methods, and existing impls incrementally converted. The variants without 'errp' are intended to be removed once all usage is converted. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-26-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	6f9fc6f501	migration: Remove error variant of vmstate_save_state() function This commit removes the redundant vmstate_save_state_with_err() function. Previously, commit `969298f9d7` introduced vmstate_save_state_with_err() to handle error propagation, while vmstate_save_state() existed for non-error scenarios. This is because there were code paths where vmstate_save_state_v() (called internally by vmstate_save_state) did not explicitly set errors on failure. This change unifies error handling by - updating vmstate_save_state() to accept an Error **errp argument. - vmstate_save_state_v() ensures errors are set directly within the errp object, eliminating the need for two separate functions. All calls to vmstate_save_state_with_err() are replaced with vmstate_save_state(). This simplifies the API and improves code maintainability. vmstate_save_state() that only calls vmstate_save_state_v(), by inference, also has errors set in errp in case of failure. The errors are reported using error_report_err(). If we want the function to exit on error, then &error_fatal is passed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-24-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	94272d9b45	migration: Capture error in postcopy_ram_listen_thread() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. postcopy_ram_listen_thread() calls qemu_loadvm_state_main() to load the vm, and in case of a failure, it should set the error in the migration object. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-23-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	aa77746602	migration: push Error **errp into loadvm_postcopy_handle_switchover_start() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_postcopy_handle_switchover_start() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-22-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	d865e4aabd	migration: push Error **errp into loadvm_process_enable_colo() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_process_enable_colo() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-21-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00

1 2 3 4 5 ...

2606 commits