qemu-cr16

Author	SHA1	Message	Date
Juraj Marcin	7b842fe354	migration: Introduce POSTCOPY_DEVICE state Currently, when postcopy starts, the source VM starts switchover and sends a package containing the state of all non-postcopiable devices. When the destination loads this package, the switchover is complete and the destination VM starts. However, if the device state load fails or the destination side crashes, the source side is already in POSTCOPY_ACTIVE state and cannot be recovered, even when it has the most up-to-date machine state as the destination has not yet started. This patch introduces a new POSTCOPY_DEVICE state which is active while the destination machine is loading the device state, is not yet running, and the source side can be resumed in case of a migration failure. Return-path is required for this state to function, otherwise it will be skipped in favor of POSTCOPY_ACTIVE. To transition from POSTCOPY_DEVICE to POSTCOPY_ACTIVE, the source side uses a PONG message that is a response to a PING message processed just before the POSTCOPY_RUN command that starts the destination VM. Thus, this feature is effective even if the destination side does not yet support this new state. Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20251103183301.3840862-9-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Juraj Marcin	67474ebacc	migration: Introduce postcopy incoming setup and cleanup functions After moving postcopy_ram_listen_thread() to postcopy file, this patch introduces a pair of functions, postcopy_incoming_setup() and postcopy_incoming_cleanup(). These functions encapsulate setup and cleanup of all incoming postcopy resources, postcopy-ram and postcopy listen thread. Furthermore, this patch also renames the postcopy_ram_listen_thread to postcopy_listen_thread, as this thread handles not only postcopy-ram, but also dirty-bitmaps and in the future it could handle other postcopiable devices. Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20251103183301.3840862-5-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Juraj Marcin	c9aac1ae10	migration: Move postcopy_ram_listen_thread() to postcopy-ram.c This patch addresses a TODO about moving postcopy_ram_listen_thread() to postcopy file. Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20251103183301.3840862-4-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Juraj Marcin	1529ec8f5f	migration: Flush migration channel after sending data of CMD_PACKAGED If the length of the data sent after CMD_PACKAGED is just right, and there is not much data to send afterward, it is possible part of the CMD_PACKAGED payload will get left behind in the sending buffer. This causes the destination side to hang while it tries to load the whole package and initiate postcopy. Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20251103183301.3840862-2-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Peter Xu	6f190736bf	migration: Fix error leak in postcopy_ram_listen_thread() As reported and analyzed by Peter: https://lore.kernel.org/r/CAFEAcA9otBWtR7rPQ0Y9aBm+7ZWJzd4VWpXrAmGr8XspPn+zpw@mail.gmail.com Fix it by freeing the error. When at it, always reset the local_err pointer in both paths. Cc: Arun Menon <armenon@redhat.com> Resolves: Coverity CID 1641390 Fixes: `94272d9b45` ("migration: Capture error in postcopy_ram_listen_thread()") Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20251021220407.2662288-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:09 -05:00
Fabiano Rosas	112b55f0b0	migration/savevm: Add a compatibility check for capabilities It has always been possible to enable arbitrary migration capabilities and attempt to take a snapshot of the VM with the savevm/loadvm commands as well as their QMP counterparts snapshot-save/snapshot-load. Most migration capabilities are not meant to be used with snapshots and there's a risk of crashing QEMU or producing incorrect behavior. Ideally, every migration capability would either be implemented for savevm or explicitly rejected. Add a compatibility check routine and reject the snapshot command if an incompatible capability is enabled. For now only act on the the two that actually cause a crash: multifd and mapped-ram. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2881 Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20251007184213.5990-1-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:09 -05:00
Arun Menon	6f9fc6f501	migration: Remove error variant of vmstate_save_state() function This commit removes the redundant vmstate_save_state_with_err() function. Previously, commit `969298f9d7` introduced vmstate_save_state_with_err() to handle error propagation, while vmstate_save_state() existed for non-error scenarios. This is because there were code paths where vmstate_save_state_v() (called internally by vmstate_save_state) did not explicitly set errors on failure. This change unifies error handling by - updating vmstate_save_state() to accept an Error **errp argument. - vmstate_save_state_v() ensures errors are set directly within the errp object, eliminating the need for two separate functions. All calls to vmstate_save_state_with_err() are replaced with vmstate_save_state(). This simplifies the API and improves code maintainability. vmstate_save_state() that only calls vmstate_save_state_v(), by inference, also has errors set in errp in case of failure. The errors are reported using error_report_err(). If we want the function to exit on error, then &error_fatal is passed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-24-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	94272d9b45	migration: Capture error in postcopy_ram_listen_thread() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. postcopy_ram_listen_thread() calls qemu_loadvm_state_main() to load the vm, and in case of a failure, it should set the error in the migration object. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-23-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	aa77746602	migration: push Error **errp into loadvm_postcopy_handle_switchover_start() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_postcopy_handle_switchover_start() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-22-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	d865e4aabd	migration: push Error **errp into loadvm_process_enable_colo() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_process_enable_colo() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-21-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	97c2fad858	migration: push Error **errp into loadvm_handle_recv_bitmap() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_handle_recv_bitmap() must report an error in errp, in case of failure. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-19-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	fd487d7fc7	migration: push Error **errp into loadvm_postcopy_ram_handle_discard() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_postcopy_ram_handle_discard() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-18-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	ff6ae44e00	migration: push Error **errp into loadvm_postcopy_handle_run() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_postcopy_handle_run() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-17-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	208e522766	migration: push Error **errp into loadvm_postcopy_handle_listen() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_postcopy_handle_listen() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-16-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	34a94fe28d	migration: push Error **errp into loadvm_postcopy_handle_advise() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_postcopy_handle_advise() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-15-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	44cdbaa98e	migration: push Error **errp into ram_postcopy_incoming_init() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that ram_postcopy_incoming_init() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-14-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	3cd7a260e5	migration: make loadvm_postcopy_handle_resume() void This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. Use warn_report() instead of error_report(); it ensures that a resume command received while the migration is not in postcopy recover state is not fatal. It only informs that the command received is unusual, and therefore we should not set errp with the error string. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-13-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	3f9d6e77b0	migration: Update qemu_file_get_return_path() docs and remove dead checks The documentation of qemu_file_get_return_path() states that it can return NULL on failure. However, a review of the current implementation reveals that it is guaranteed that it will always succeed and will never return NULL. As a result, the NULL checks post calling the function become redundant. This commit updates the documentation for the function and removes all NULL checks throughout the migration code. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-12-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	8b6dad124b	migration: push Error **errp into qemu_loadvm_section_part_end() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_loadvm_section_part_end() must report an error in errp, in case of failure. This patch also removes the setting of errp when errp is NULL in the out section as it is no longer required in the series. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-11-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	4493752653	migration: push Error **errp into qemu_loadvm_section_start_full() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_loadvm_section_start_full() must report an error in errp, in case of failure. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-10-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	deecb4e713	migration: push Error **errp into qemu_loadvm_state_main() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_loadvm_state_main() must report an error in errp, in case of failure. Set errp explicitly if it is NULL in case of failure in the out section. This will be removed in the subsequent patch when all of the calls are converted to passing errp. The error message in the default case of qemu_loadvm_state_main() has the word "savevm". This is removed because it can confuse the user while reading destination side error logs. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-9-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	84279d5dc1	migration: push Error **errp into qemu_load_device_state() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_load_device_state() must report an error in errp, in case of failure. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-8-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	9535435795	migration: push Error **errp into qemu_loadvm_state() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_loadvm_state() must report an error in errp, in case of failure. When postcopy live migration runs, the device states are loaded by both the qemu coroutine process_incoming_migration_co() and the postcopy_ram_listen_thread(). Therefore, it is important that the coroutine also reports the error in case of failure, with error_report_err(). Otherwise, the source qemu will not display any errors before going into the postcopy pause state. Suggested-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-7-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	829f0d238d	migration: push Error **errp into loadvm_handle_cmd_packaged() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_handle_cmd_packaged() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-6-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	5fb0cfbb35	migration: push Error **errp into loadvm_process_command() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_process_command() must report an error in errp, in case of failure. The errors are temporarily reported using error_report_err(). This is removed in the subsequent patches in this series when we are actually able to propagate the error to the calling function. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-5-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	e97da6b583	migration: push Error **errp into vmstate_load() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that vmstate_load() must report an error in errp, in case of failure. The errors are temporarily reported using error_report_err(). This is removed in the subsequent patches in this series when we are actually able to propagate the error to the calling function. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-4-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:01 -04:00
Arun Menon	71cf63a471	migration: push Error **errp into qemu_loadvm_state_header() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_loadvm_state_header() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-3-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:01 -04:00
Arun Menon	c632ffbd74	migration: push Error **errp into vmstate_load_state() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that vmstate_load_state() must report an error in errp, in case of failure. The errors are temporarily reported using error_report_err(). This is removed in the subsequent patches in this series, when we are actually able to propagate the error to the calling function using errp. Whereas, if we want the function to exit on error, then error_fatal is passed. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-2-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:01 -04:00
Vladimir Sementsov-Ogievskiy	fe6a74f365	migration: qemu_file_set_blocking(): add errp parameter qemu_file_set_blocking() is a wrapper on qio_channel_set_blocking(), so let's passthrough the errp. Note the migration should not be using &error_abort in these calls, however, this is done to expedite the API conversion. The original code would have eventually ended up calling either qemu_socket_set_nonblock which would asset on Linux, or g_unix_set_fd_nonblocking which would propagate errors. We never saw asserts in practice, and conceptually they should not happen, but ideally this code will be later adapted to remove use of &error_abort. Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2025-09-19 12:46:07 +01:00
Juraj Marcin	beeac2df5f	migration: Rename save_live_complete_precopy_thread to save_complete_precopy_thread Recent patch [1] renames the save_live_complete_precopy handler to save_complete, as the machine is not live in most cases when this handler is executed. The same is true also for save_live_complete_precopy_thread, therefore this patch removes the "live" keyword from the handler itself and related types to keep the naming unified. In contrast to save_complete, this handler is only executed at the end of precopy, therefore the "precopy" keyword is retained. [1]: https://lore.kernel.org/all/20250613140801.474264-7-peterx@redhat.com/ Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cédric Le Goater <clg@redhat.com> Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250626085235.294690-1-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:39 -03:00
Peter Xu	7dd95a9630	migration: qemu_savevm_complete*() helpers Since we use the same save_complete() hook for both precopy and postcopy, add a set of helpers to invoke the hook() to dedup the code. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-8-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:36 -03:00
Peter Xu	57c43e52bd	migration: Rename save_live_complete_precopy to save_complete Now after merging the precopy and postcopy version of complete() hook, rename the precopy version from save_live_complete_precopy() to save_complete(). Dropping the "live" when at it, because it's in most cases not live when happening (in precopy). No functional change intended. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-7-peterx@redhat.com [peterx: squash the fixup that covers a few more doc spots, per Juraj] Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:36 -03:00
Peter Xu	d7530a9682	migration: Drop save_live_complete_postcopy hook The hook is only defined in two vmstate users ("ram" and "block dirty bitmap"), meanwhile both of them define the hook exactly the same as the precopy version. Hence, this postcopy version isn't needed. No functional change intended. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:35 -03:00
Steve Sistare	081c09dc52	migration: lower handler priority Define a vmstate priority that is lower than the default, so its handlers run after all default priority handlers. Since 0 is no longer the default priority, translate an uninitialized priority of 0 to MIG_PRI_DEFAULT. CPR for vfio will use this to install handlers for containers that run after handlers for the devices that they contain. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/qemu-devel/1749569991-25171-3-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-06-11 14:01:58 +02:00
Akihiko Odaki	d401b7aed8	migration/postcopy: Replace QemuSemaphore with QemuEvent thread_sync_sem is an one-shot event so it can be converted into QemuEvent, which is more lightweight. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250529-event-v5-9-53b285203794@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-06-06 14:32:55 +02:00
Peter Xu	1d48111601	migration: Add save_postcopy_prepare() savevm handler Add a savevm handler for a module to opt-in sending extra sections right before postcopy starts, and before VM is stopped. RAM will start to use this new savevm handler in the next patch to do flush and sync for multifd pages. Note that we choose to do it before VM stopped because the current only potential user is not sensitive to VM status, so doing it before VM is stopped is preferred to enlarge any postcopy downtime. It is still a bit unfortunate that we need to introduce such a new savevm handler just for the only use case, however it's so far the cleanest. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-ID: <20250411114534.3370816-4-ppandit@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-05-02 11:09:36 -04:00
Richard Henderson	12eeb04ab4	page-vary: Move and rename qemu_target_page_bits_min Rename to migration_legacy_page_bits, to make it clear that we cannot change the value without causing a migration break. Move to page-vary.h and page-vary-target.c. Define via TARGET_PAGE_BITS if not TARGET_PAGE_BITS_VARY. Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2025-04-23 15:04:57 -07:00
Richard Henderson	8be545ba5a	include/system: Move exec/memory.h to system/memory.h Convert the existing includes with sed -i ,exec/memory.h,system/memory.h,g Move the include within cpu-all.h into a !CONFIG_USER_ONLY block. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2025-04-23 14:08:21 -07:00
Steve Sistare	094a3dbc55	migration: ram block cpr blockers Unlike cpr-reboot mode, cpr-transfer mode cannot save volatile ram blocks in the migration stream file and recreate them later, because the physical memory for the blocks is pinned and registered for vfio. Add a blocker for volatile ram blocks. Also add a blocker for RAM_GUEST_MEMFD. Preserving guest_memfd may be sufficient for CPR, but it has not been tested yet. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-ID: <1740667681-257312-1-git-send-email-steven.sistare@oracle.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-03-10 12:09:24 -03:00
Maciej S. Szmigiero	8305921a91	migration: Add save_live_complete_precopy_thread handler This SaveVMHandler helps device provide its own asynchronous transmission of the remaining data at the end of a precopy phase via multifd channels, in parallel with the transfer done by save_live_complete_precopy handlers. These threads are launched only when multifd device state transfer is supported. Management of these threads in done in the multifd migration code, wrapping them in the generic thread pool. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/qemu-devel/eac74a4ca7edd8968bbf72aa07b9041c76364a16.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-03-06 06:47:33 +01:00
Maciej S. Szmigiero	b1937fd1eb	migration: Add thread pool of optional load threads Some drivers might want to make use of auxiliary helper threads during VM state loading, for example to make sure that their blocking (sync) I/O operations don't block the rest of the migration process. Add a migration core managed thread pool to facilitate this use case. The migration core will wait for these threads to finish before (re)starting the VM at destination. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/b09fd70369b6159c75847e69f235cb908b02570c.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-03-06 06:47:33 +01:00
Maciej S. Szmigiero	6a76eb4872	migration: Always take BQL for migration_incoming_state_destroy() All callers to migration_incoming_state_destroy() other than postcopy_ram_listen_thread() do this call with BQL held. Since migration_incoming_state_destroy() ultimately calls "load_cleanup" SaveVMHandlers and it will soon call BQL-sensitive code it makes sense to always call that function under BQL rather than to have it deal with both cases (with BQL and without BQL). Add the necessary bql_lock() and bql_unlock() to postcopy_ram_listen_thread(). qemu_loadvm_state_main() in postcopy_ram_listen_thread() could call "load_state" SaveVMHandlers that are expecting BQL to be held. In principle, the only devices that should be arriving on migration channel serviced by postcopy_ram_listen_thread() are those that are postcopiable and whose load handlers are safe to be called without BQL being held. But nothing currently prevents the source from sending data for "unsafe" devices which would cause trouble there. Add a TODO comment there so it's clear that it would be good to improve handling of such (erroneous) case in the future. Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/21bb5ca337b1d5a802e697f553f37faf296b5ff4.1741193259.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-03-06 06:47:33 +01:00
Maciej S. Szmigiero	a30363db08	migration: Add qemu_loadvm_load_state_buffer() and its handler qemu_loadvm_load_state_buffer() and its load_state_buffer SaveVMHandler allow providing device state buffer to explicitly specified device via its idstr and instance id. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/71ca753286b87831ced4afd422e2e2bed071af25.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-03-06 06:47:33 +01:00
Maciej S. Szmigiero	4e55cb3cde	migration: Add MIG_CMD_SWITCHOVER_START and its load handler This QEMU_VM_COMMAND sub-command and its switchover_start SaveVMHandler is used to mark the switchover point in main migration stream. It can be used to inform the destination that all pre-switchover main migration stream data has been sent/received so it can start to process post-switchover data that it might have received via other migration channels like the multifd ones. Add also the relevant MigrationState bit stream compatibility property and its hw_compat entry. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Zhang Chen <zhangckid@gmail.com> # for the COLO part Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/311be6da85fc7e49a7598684d80aa631778dcbce.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-03-06 06:47:33 +01:00
Fabiano Rosas	e0ad300fe1	migration: Check migration error after loadvm We're currently only checking the QEMUFile error after qemu_loadvm_state(). This was causing a TLS termination error from multifd recv threads to be ignored. Start checking the migration error as well to avoid missing further errors. Regarding compatibility concerning the TLS termination error that was being ignored, for QEMUs <= 9.2 - if the old QEMU is being used as migration source - the recently added migration property multifd-tls-clean-termination needs to be set to OFF in the destination machine. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-02-14 15:19:04 -03:00
Peter Xu	455c1963d3	migration: Cleanup qemu_savevm_state_complete_precopy() Now qemu_savevm_state_complete_precopy() is never used in postcopy, clean it up as in_postcopy==false now unconditionally. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-14-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:41 -03:00
Peter Xu	15c2ffa0b7	migration: Unwrap qemu_savevm_state_complete_precopy() in postcopy Postcopy invokes qemu_savevm_state_complete_precopy() twice for a long time, and that caused way too much confusions. Let's clean this up and make postcopy easier to read. It's actually fairly straightforward: postcopy starts with saving non-postcopiable iterables, then later it saves again with non-iterable only. Move these two calls out makes everything much easier to follow. Otherwise it's very unclear what qemu_savevm_state_complete_precopy() did in either of the calls. No functional change intended. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-13-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:41 -03:00
Peter Xu	46b0155ecf	migration: Notify COMPLETE once for postcopy Postcopy invokes qemu_savevm_state_complete_precopy() twice, that means it'll invoke COMPLETE notify twice.. also twice the tracepoints that marking precopy complete. Move that notification (along with the tracepoint) out to the caller, so that postcopy will only notify once right at the start of switchover phase from precopy. When at it, rename it to suite the file now it locates. For precopy, there should have no functional change except the tracepoint has a name change. For the other two users of qemu_savevm_state_complete_precopy(), namely: qemu_savevm_state() and qemu_savevm_live_state(): the notifier shouldn't matter because they're not precopy at all. Now in these two contexts (aka, "savevm", and "colo") sometimes the precopy notifiers will still be invoked, but that's outside the scope of this patch. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-12-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:41 -03:00
Peter Xu	89011a702f	migration: Synchronize all CPU states only for non-iterable dump Do one shot cpu sync at qemu_savevm_state_complete_precopy_non_iterable(), instead of coding it separately in two places. Note that in the context of qemu_savevm_state_complete_precopy(), this patch is also an optimization for postcopy path, in that we can avoid sync cpu twice during switchover: before this patch, postcopy_start() invokes twice on qemu_savevm_state_complete_precopy(), each of them will try to sync CPU info. In reality, only one of them would be enough. For background snapshot, there's no intended functional change. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-7-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:40 -03:00
Peter Xu	4822128693	migration: Drop inactivate_disk param in qemu_savevm_state_complete* This parameter is only used by one caller, which is the genuine precopy complete path (migration_completion_precopy). The parameter was introduced in `a1fbe750fd` ("migration: Fix race of image locking between src and dst") to make sure the inactivate will happen before EOF to make sure dest will always be able to activate the disk properly. However there's no limitation on how early we inactivate the disk. For precopy completion path, we can always do that as long as VM is stopped. Move the disk inactivate there, then we can remove this inactivate_disk parameter in the whole call stack, because all the rest users pass in false always. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-6-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:40 -03:00

1 2 3 4 5 ...

428 commits