qemu-cr16

Author	SHA1	Message	Date
Matthew Rosato	911bdd34ca	migration: set correct list pointer when removing notifier In migration_remove_notifier(), g_slist_remove() will search for and potentially remove an entry from the specified list. The return value should be used to update the potentially-changed head pointer of the list that was just searched (migration_state_notifiers[mode]) instead of the migration blockers list. Fixes: `dc79c7d5e1` ("migration: multi-mode notifier") Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20251113213545.513453-1-mjrosato@linux.ibm.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-21 10:33:21 -05:00
Li Zhijian	0b5bf4ea76	migration: Fix transition to COLO state from precopy Commit `4881411136` ("migration: Always set DEVICE state") set a new DEVICE state before completed during migration, which broke the original transition to COLO. The migration flow for precopy has changed to: active -> pre-switchover -> device -> completed. This patch updates the transition state to ensure that the Pre-COLO state corresponds to DEVICE state correctly. Cc: qemu-stable <qemu-stable@nongnu.org> Fixes: `4881411136` ("migration: Always set DEVICE state") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Zhang Chen <zhangckid@gmail.com> Tested-by: Zhang Chen <zhangckid@gmail.com> Link: https://lore.kernel.org/r/20251104013606.1937764-1-lizhijian@fujitsu.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-21 10:33:21 -05:00
Juraj Marcin	7b842fe354	migration: Introduce POSTCOPY_DEVICE state Currently, when postcopy starts, the source VM starts switchover and sends a package containing the state of all non-postcopiable devices. When the destination loads this package, the switchover is complete and the destination VM starts. However, if the device state load fails or the destination side crashes, the source side is already in POSTCOPY_ACTIVE state and cannot be recovered, even when it has the most up-to-date machine state as the destination has not yet started. This patch introduces a new POSTCOPY_DEVICE state which is active while the destination machine is loading the device state, is not yet running, and the source side can be resumed in case of a migration failure. Return-path is required for this state to function, otherwise it will be skipped in favor of POSTCOPY_ACTIVE. To transition from POSTCOPY_DEVICE to POSTCOPY_ACTIVE, the source side uses a PONG message that is a response to a PING message processed just before the POSTCOPY_RUN command that starts the destination VM. Thus, this feature is effective even if the destination side does not yet support this new state. Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20251103183301.3840862-9-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Juraj Marcin	468a7effde	migration: Refactor all incoming cleanup info migration_incoming_destroy() Currently, there are two functions that are responsible for calling the cleanup of the incoming migration state. With successful precopy, it's the incoming migration coroutine, and with successful postcopy it's the postcopy listen thread. However, if postcopy fails during in the device load, both functions will try to do the cleanup. This patch refactors all cleanup that needs to be done on the incoming side into a common function and defines a clear boundary, who is responsible for the cleanup. The incoming migration coroutine is responsible for calling the cleanup function, unless the listen thread has been started, in which case the postcopy listen thread runs the incoming migration cleanup in its BH. Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Fixes: `9535435795` ("migration: push Error **errp into qemu_loadvm_state()") Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20251103183301.3840862-6-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Juraj Marcin	67474ebacc	migration: Introduce postcopy incoming setup and cleanup functions After moving postcopy_ram_listen_thread() to postcopy file, this patch introduces a pair of functions, postcopy_incoming_setup() and postcopy_incoming_cleanup(). These functions encapsulate setup and cleanup of all incoming postcopy resources, postcopy-ram and postcopy listen thread. Furthermore, this patch also renames the postcopy_ram_listen_thread to postcopy_listen_thread, as this thread handles not only postcopy-ram, but also dirty-bitmaps and in the future it could handle other postcopiable devices. Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20251103183301.3840862-5-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Peter Xu	26f65c01ed	migration: Do not try to start VM if disk activation fails If a rare split brain happens (e.g. dest QEMU started running somehow, taking shared drive locks), src QEMU may not be able to activate the drives anymore. In this case, src QEMU shouldn't start the VM or it might crash the block layer later with something like: Meanwhile, src QEMU cannot try to continue either even if dest QEMU can release the drive locks (e.g. by QMP "stop"). Because as long as dest QEMU started running, it means dest QEMU's RAM is the only version that is consistent with current status of the shared storage. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20251103183301.3840862-3-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Markus Armbruster	5777e20718	migration: Put Error *errp parameter last qapi/error.h's big comment: - Functions that use Error to report errors have an Error *errp parameter. It should be the last parameter, except for functions * taking variable arguments. is_only_migratable() and add_blockers() have it in the middle. Clean them up. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20251027064503.1074255-4-armbru@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Markus Armbruster	3ca0a0ab05	migration: Use bitset of MigMode instead of variable arguments migrate_add_blocker_modes() and migration_add_notifier_modes use variable arguments for a set of migration modes. The variable arguments get collected into a bitset for processsing. Take a bitset argument instead, it's simpler. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20251027064503.1074255-3-armbru@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Markus Armbruster	75a9f080c2	migration: Use unsigned instead of int for bit set of MigMode Signed operands in bitwise operations are unwise. I believe they're safe here, but avoiding them is easy, so let's do that. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20251027064503.1074255-2-armbru@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Peter Xu	ded3cf4aaf	migration/cpr: Avoid crashing QEMU when cpr-exec runs with no args If an user invokes cpr-exec without setting the exec args first, currently it'll crash QEMU. Avoid it, instead fail the QMP migrate command. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20251021220407.2662288-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:10 -05:00
Peter Xu	6a65fdee8a	migration/cpr: Fix coverity report in cpr_exec_persist_state() Per reported and analyzed by Peter: https://lore.kernel.org/r/CAFEAcA_mUQ2NeoguR5efrhw7XYGofnriWEA=+Dg+Ocvyam1wAw@mail.gmail.com mfd leak is a false positive, try to use a coverity annotation (which I didn't find manual myself, but still give it a shot). Fix the other one by capture error if setenv() failed. When at it, pass the error to the top (cpr_state_save()). Along the way, changing all retval to bool when errp is around. Resolves: Coverity CID 1641391 Resolves: Coverity CID 1641392 Fixes: `efc6587313` ("migration: cpr-exec save and load") Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20251021220407.2662288-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-11-03 16:04:09 -05:00
Steve Sistare	a3eae205c6	migration: cpr-exec mode Add the cpr-exec migration mode. Usage: qemu-system-$arch -machine aux-ram-share=on ... migrate_set_parameter mode cpr-exec migrate_set_parameter cpr-exec-command \ <arg1> <arg2> ... -incoming <uri-1> \ migrate -d <uri-1> The migrate command stops the VM, saves state to uri-1, directly exec's a new version of QEMU on the same host, replacing the original process while retaining its PID, and loads state from uri-1. Guest RAM is preserved in place, albeit with new virtual addresses. The new QEMU process is started by exec'ing the command specified by the @cpr-exec-command parameter. The first word of the command is the binary, and the remaining words are its arguments. The command may be a direct invocation of new QEMU, or may be a non-QEMU command that exec's the new QEMU binary. This mode creates a second migration channel that is not visible to the user. At the start of migration, old QEMU saves CPR state to the second channel, and at the end of migration, it tells the main loop to call cpr_exec. New QEMU loads CPR state early, before objects are created. Because old QEMU terminates when new QEMU starts, one cannot stream data between the two, so uri-1 must be a type, such as a file, that accepts all data before old QEMU exits. Otherwise, old QEMU may quietly block writing to the channel. Memory-backend objects must have the share=on attribute, but memory-backend-epc is not supported. The VM must be started with the '-machine aux-ram-share=on' option, which allows anonymous memory to be transferred in place to the new process. The memfds are kept open across exec by clearing the close-on-exec flag, their values are saved in CPR state, and they are mmap'd in new QEMU. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Acked-by: Markus Armbruster <armbru@redhat.com> Link: https://lore.kernel.org/r/1759332851-370353-7-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Steve Sistare	dc79c7d5e1	migration: multi-mode notifier Allow a notifier to be added for multiple migration modes. To allow a notifier to appear on multiple per-node lists, use a generic list type. We can no longer use NotifierWithReturnList, because it shoe horns the notifier onto a single list. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/1759332851-370353-2-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Juraj Marcin	725a9e5f78	migration: Fix state transition in postcopy_start() error handling Commit `4881411136` ("migration: Always set DEVICE state") introduced DEVICE state to postcopy, which moved the actual state transition that leads to POSTCOPY_ACTIVE. However, the error handling part of the postcopy_start() function still expects the state POSTCOPY_ACTIVE, but depending on where an error happens, now the state can be either ACTIVE, DEVICE or CANCELLING, but never POSTCOPY_ACTIVE, as this transition now happens just before a successful return from the function. Instead, accept any state except CANCELLING when transitioning to FAILED state. Cc: qemu-stable@nongnu.org Fixes: `4881411136` ("migration: Always set DEVICE state") Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250826115145.871272-1-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Peter Xu	dc487044d5	migration: Make migration_has_failed() work even for CANCELLING No issue I hit, the change is only from code observation when I am looking at a TLS premature termination issue. We set CANCELLED very late, it means migration_has_failed() may not work correctly if it's invoked before updating CANCELLING to CANCELLED. Allow that state will make migration_has_failed() working as expected even if it's invoked slightly earlier. One current user is the multifd code for the TLS graceful termination, where it's before updating to CANCELLED. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250918203937.200833-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	d865e4aabd	migration: push Error **errp into loadvm_process_enable_colo() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_process_enable_colo() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-21-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	3f9d6e77b0	migration: Update qemu_file_get_return_path() docs and remove dead checks The documentation of qemu_file_get_return_path() states that it can return NULL on failure. However, a review of the current implementation reveals that it is guaranteed that it will always succeed and will never return NULL. As a result, the NULL checks post calling the function become redundant. This commit updates the documentation for the function and removes all NULL checks throughout the migration code. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-12-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Arun Menon	9535435795	migration: push Error **errp into qemu_loadvm_state() This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_loadvm_state() must report an error in errp, in case of failure. When postcopy live migration runs, the device states are loaded by both the qemu coroutine process_incoming_migration_co() and the postcopy_ram_listen_thread(). Therefore, it is important that the coroutine also reports the error in case of failure, with error_report_err(). Otherwise, the source qemu will not display any errors before going into the postcopy pause state. Suggested-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-7-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-10-03 09:48:02 -04:00
Vladimir Sementsov-Ogievskiy	fe6a74f365	migration: qemu_file_set_blocking(): add errp parameter qemu_file_set_blocking() is a wrapper on qio_channel_set_blocking(), so let's passthrough the errp. Note the migration should not be using &error_abort in these calls, however, this is done to expedite the API conversion. The original code would have eventually ended up calling either qemu_socket_set_nonblock which would asset on Linux, or g_unix_set_fd_nonblocking which would propagate errors. We never saw asserts in practice, and conceptually they should not happen, but ideally this code will be later adapted to remove use of &error_abort. Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2025-09-19 12:46:07 +01:00
Peter Xu	d2a81ca8c6	migration/postcopy: Push blocktime start/end into page req mutex The postcopy blocktime feature was tricky that it used quite some atomic operations over quite a few arrays and vars, without explaining how that would be thread safe. The thread safety here is about concurrency between the fault thread and the fault resolution threads, possible to access the same chunk of data. All these atomic ops can be expensive too before knowing clearly how it works. OTOH, postcopy has one page_request_mutex used to serialize the received bitmap updates. So far it's ok - we don't yet have a lot of threads contending the lock. It might change after multifd will be supported, but that's a separate story. What is important is, with that mutex, it's pretty lightweight to move all the blocktime maintenance into the mutex critical section. It's because the blocktime layer is lightweighted: almost "remember which vcpu faulted on which address", and "ok we get some fault resolved, calculate how long it takes". It's also an optional feature for now (but I have thought of changing that, maybe in the future). Let's push the blocktime layer into the mutex, so that it's always thread-safe even without any atomic ops. To achieve that, I'll need to add a tid parameter on fault path so that it'll start to pass the faulted thread ID into deeper the stack, but not too deep. When at it, add a comment for the shared fault handler (for example, vhost-user devices running with postcopy), to mention a TODO. One reason it might not be trivial is that vhost-user's userfaultfds should be opened by vhost-user process, so it's pretty hard to control making sure the TID feature will be around. It wasn't supported before, so keep it like that for now. Now we should be as ease when everything is protected by a mutex that we always take anyway. One side effect: we can finally remove one ramblock_recv_bitmap_test() in mark_postcopy_blocktime_begin(), which was pretty weird and which also includes a weird (but maybe necessary.. but maybe not?) operation to inject a blocktime entry then quickly erase it.. When we're with the mutex, and when we make sure it's invoked after checking the receive bitmap, it's not needed anymore. Instead, we assert. As another side effect, this paves way for removing all atomic ops in all the mem accesses in blocktime layer. Note that we need a stub for mark_postcopy_blocktime_begin() for Windows builds. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:37 -03:00
Peter Xu	7aaa1fc072	migration: Rewrite the migration complete detect logic There're a few things off here in that logic, rewrite it. When at it, add rich comment to explain each of the decisions. Since this is very sensitive path for migration, below are the list of things changed with their reasonings. (1) Exact pending size is only needed for precopy not postcopy Fundamentally it's because "exact" version only does one more deep sync to fetch the pending results, while in postcopy's case it's never going to sync anything more than estimate as the VM on source is stopped. (2) Do _not_ rely on threshold_size anymore to decide whether postcopy should complete threshold_size was calculated from the expected downtime and bandwidth only during precopy as an efficient way to decide when to switchover. It's not sensible to rely on threshold_size in postcopy. For precopy, if switchover is decided, the migration will complete soon. It's not true for postcopy. Logically speaking, postcopy should only complete the migration if all pending data is flushed. Here it used to work because save_complete() used to implicitly contain save_live_iterate() when there's pending size. Even if that looks benign, having RAMs to be migrated in postcopy's save_complete() has other bad side effects: (a) Since save_complete() needs to be run once at a time, it means when moving RAM there's no way moving other things (rather than round-robin iterating the vmstate handlers like what we do with ITERABLE phase). Not an immediate concern, but it may stop working in the future when there're more than one iterables (e.g. vfio postcopy). (b) postcopy recovery, unfortunately, only works during ITERABLE phase. IOW, if src QEMU moves RAM during postcopy's save_complete() and network failed, then it'll crash both QEMUs... OTOH if it failed during iteration it'll still be recoverable. IOW, this change should further reduce the window QEMU split brain and crash in extreme cases. If we enable the ram_save_complete() tracepoints, we'll see this before this patch: 1267959@1748381938.294066:ram_save_complete dirty=9627, done=0 1267959@1748381938.308884:ram_save_complete dirty=0, done=1 It means in this migration there're 9627 pages migrated at complete() of postcopy phase. After this change, all the postcopy RAM should be migrated in iterable phase, rather than save_complete(): 1267959@1748381938.294066:ram_save_complete dirty=0, done=0 1267959@1748381938.308884:ram_save_complete dirty=0, done=1 (3) Adjust when to decide to switch to postcopy This shouldn't be super important, the movement makes sure there's only one in_postcopy check, then we are clear on what we do with the two completely differnt use cases (precopy v.s. postcopy). (4) Trivial touch up on threshold_size comparision Which changes: "(!pending_size \|\| pending_size < s->threshold_size)" into: "(pending_size <= s->threshold_size)" Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-11-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:36 -03:00
Peter Xu	2145f38c31	migration/bg-snapshot: Do not check for SKIP in iterator It's not possible to happen in bg-snapshot case. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250613140801.474264-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-07-11 10:37:35 -03:00
Akihiko Odaki	952691b7a6	migration: Replace QemuSemaphore with QemuEvent pause_event can utilize qemu_event_reset() to discard events. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20250529-event-v5-7-53b285203794@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-06-06 14:32:55 +02:00
Peter Xu	1d48111601	migration: Add save_postcopy_prepare() savevm handler Add a savevm handler for a module to opt-in sending extra sections right before postcopy starts, and before VM is stopped. RAM will start to use this new savevm handler in the next patch to do flush and sync for multifd pages. Note that we choose to do it before VM stopped because the current only potential user is not sensitive to VM status, so doing it before VM is stopped is preferred to enlarge any postcopy downtime. It is still a bit unfortunate that we need to introduce such a new savevm handler just for the only use case, however it's so far the cleanest. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-ID: <20250411114534.3370816-4-ppandit@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-05-02 11:09:36 -04:00
Prasad Pandit	00f3fcef19	migration: refactor channel discovery mechanism The various logical migration channels don't have a standardized way of advertising themselves and their connections may be seen out of order by the migration destination. When a new connection arrives, the incoming migration currently make use of heuristics to determine which channel it belongs to. The next few patches will need to change how the multifd and postcopy capabilities interact and that affects the channel discovery heuristic. Refactor the channel discovery heuristic to make it less opaque and simplify the subsequent patches. Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-ID: <20250411114534.3370816-3-ppandit@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-05-02 11:09:36 -04:00
Li Zhijian	57be554c29	migration: check RDMA and capabilities are compatible on both sides Depending on the order of starting RDMA and setting capability, they can be categorized into the following scenarios: Source: S1: [set capabilities] -> [Start RDMA outgoing] Destination: D1: [set capabilities] -> [Start RDMA incoming] D2: [Start RDMA incoming] -> [set capabilities] Previously, compatibility between RDMA and capabilities was verified only in scenario D1, potentially causing migration failures in other situations. For scenarios S1 and D1, we can seamlessly incorporate migration_transport_compatible() to address compatibility between channels and capabilities vs transport. For scenario D2, ensure compatibility within migrate_caps_check(). Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Message-ID: <20250305062825.772629-3-lizhijian@fujitsu.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-05-02 11:09:36 -04:00
Philippe Mathieu-Daudé	12d1a768bd	qom: Have class_init() take a const data argument Mechanical change using gsed, then style manually adapted to pass checkpatch.pl script. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250424194905.82506-4-philmd@linaro.org>	2025-04-25 17:00:41 +02:00
Peter Xu	d657a14de5	migration: Fix UAF for incoming migration on MigrationState On the incoming migration side, QEMU uses a coroutine to load all the VM states. Inside, it may reference MigrationState on global states like migration capabilities, parameters, error state, shared mutexes and more. However there's nothing yet to make sure MigrationState won't get destroyed (e.g. after migration_shutdown()). Meanwhile there's also no API available to remove the incoming coroutine in migration_shutdown(), avoiding it to access the freed elements. There's a bug report showing this can happen and crash dest QEMU when migration is cancelled on source. When it happens, the dest main thread is trying to cleanup everything: #0 qemu_aio_coroutine_enter #1 aio_dispatch_handler #2 aio_poll #3 monitor_cleanup #4 qemu_cleanup #5 qemu_default_main Then it found the migration incoming coroutine, schedule it (even after migration_shutdown()), causing crash: #0 __pthread_kill_implementation #1 __pthread_kill_internal #2 __GI_raise #3 __GI_abort #4 __assert_fail_base #5 __assert_fail #6 qemu_mutex_lock_impl #7 qemu_lockable_mutex_lock #8 qemu_lockable_lock #9 qemu_lockable_auto_lock #10 migrate_set_error #11 process_incoming_migration_co #12 coroutine_trampoline To fix it, take a refcount after an incoming setup is properly done when qmp_migrate_incoming() succeeded the 1st time. As it's during a QMP handler which needs BQL, it means the main loop is still alive (without going into cleanups, which also needs BQL). Releasing the refcount now only until the incoming migration coroutine finished or failed. Hence the refcount is valid for both (1) setup phase of incoming ports, mostly IO watches (e.g. qio_channel_add_watch_full()), and (2) the incoming coroutine itself (process_incoming_migration_co()). Note that we can't unref in migration_incoming_state_destroy(), because both qmp_xen_load_devices_state() and load_snapshot() will use it without an incoming migration. Those hold BQL so they're not prone to this issue. PS: I suspect nobody uses Xen's command at all, as it didn't register yank, hence AFAIU the command should crash on master when trying to unregister yank in migration_incoming_state_destroy().. but that's another story. Also note that in some incoming failure cases we may not always unref the MigrationState refcount, which is a trade-off to keep things simple. We could make it accurate, but it can be an overkill. Some examples: - Unlike most of the rest protocols, socket_start_incoming_migration() may create net listener after incoming port setup sucessfully. It means we can't unref in migration_channel_process_incoming() as a generic path because socket protocol might keep using MigrationState. - For either socket or file, multiple IO watches might be created, it means logically each IO watch needs to take one refcount for MigrationState so as to be 100% accurate on ownership of refcount taken. In general, we at least need per-protocol handling to make it accurate, which can be an overkill if we know incoming failed after all. Add a short comment to explain that when taking the refcount in qmp_migrate_incoming(). Bugzilla: https://issues.redhat.com/browse/RHEL-69775 Tested-by: Yan Fu <yafu@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-ID: <20250220132459.512610-1-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-03-10 12:09:24 -03:00
Maciej S. Szmigiero	b1937fd1eb	migration: Add thread pool of optional load threads Some drivers might want to make use of auxiliary helper threads during VM state loading, for example to make sure that their blocking (sync) I/O operations don't block the rest of the migration process. Add a migration core managed thread pool to facilitate this use case. The migration core will wait for these threads to finish before (re)starting the VM at destination. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/b09fd70369b6159c75847e69f235cb908b02570c.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-03-06 06:47:33 +01:00
Maciej S. Szmigiero	6a76eb4872	migration: Always take BQL for migration_incoming_state_destroy() All callers to migration_incoming_state_destroy() other than postcopy_ram_listen_thread() do this call with BQL held. Since migration_incoming_state_destroy() ultimately calls "load_cleanup" SaveVMHandlers and it will soon call BQL-sensitive code it makes sense to always call that function under BQL rather than to have it deal with both cases (with BQL and without BQL). Add the necessary bql_lock() and bql_unlock() to postcopy_ram_listen_thread(). qemu_loadvm_state_main() in postcopy_ram_listen_thread() could call "load_state" SaveVMHandlers that are expecting BQL to be held. In principle, the only devices that should be arriving on migration channel serviced by postcopy_ram_listen_thread() are those that are postcopiable and whose load handlers are safe to be called without BQL being held. But nothing currently prevents the source from sending data for "unsafe" devices which would cause trouble there. Add a TODO comment there so it's clear that it would be good to improve handling of such (erroneous) case in the future. Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/21bb5ca337b1d5a802e697f553f37faf296b5ff4.1741193259.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-03-06 06:47:33 +01:00
Maciej S. Szmigiero	4e55cb3cde	migration: Add MIG_CMD_SWITCHOVER_START and its load handler This QEMU_VM_COMMAND sub-command and its switchover_start SaveVMHandler is used to mark the switchover point in main migration stream. It can be used to inform the destination that all pre-switchover main migration stream data has been sent/received so it can start to process post-switchover data that it might have received via other migration channels like the multifd ones. Add also the relevant MigrationState bit stream compatibility property and its hw_compat entry. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Zhang Chen <zhangckid@gmail.com> # for the COLO part Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/311be6da85fc7e49a7598684d80aa631778dcbce.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-03-06 06:47:33 +01:00
Fabiano Rosas	4a228bcc99	migration: Don't set FAILED state when cancelling The expected outcome from qmp_migrate_cancel() is that the source migration goes to the terminal state MIGRATION_STATUS_CANCELLED. Anything different from this is a bug when cancelling. Make sure there is never a state transition from an unspecified state into FAILED. Code that sets FAILED, should always either make sure that the old state is not CANCELLING or specify the old state. Note that the destination is allowed to go into FAILED, so there's no issue there. (I don't think this is relevant as a backport because cancelling does work, it just doesn't show the right state at the end) Fixes: `3dde8fdbad` ("migration: Merge precopy/postcopy on switchover start") Fixes: `d0edb8a173` ("migration: Create the postcopy preempt channel asynchronously") Fixes: `8518278a6a` ("migration: implementation of background snapshot thread") Fixes: `bf78a046b9` ("migration: refactor migrate_fd_connect failures") Reviewed-by: Peter Xu <peterx@redhat.com> Message-ID: <20250213175927.19642-7-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-02-14 15:19:06 -03:00
Fabiano Rosas	646119088f	migration: Reject qmp_migrate_cancel after postcopy After postcopy has started, it's not possible to recover the source machine in case a migration error occurs because the destination has already been changing the state of the machine. For that same reason, it doesn't make sense to try to cancel the migration after postcopy has started. Reject the cancel command during postcopy. Reviewed-by: Peter Xu <peterx@redhat.com> Message-ID: <20250213175927.19642-6-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-02-14 15:19:05 -03:00
Fabiano Rosas	4bbadfc55e	migration: Change migrate_fd_ to migration_ Remove all instances of _fd_ from the migration generic code. These functions have grown over time and the _fd_ part is now just confusing. migration_fd_error() -> migration_error() makes it a little vague. Since it's only used for migration_connect() failures, change it to migration_connect_set_error(). Reviewed-by: Peter Xu <peterx@redhat.com> Message-ID: <20250213175927.19642-4-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-02-14 15:19:05 -03:00
Fabiano Rosas	8444d09381	migration: Unify migration_cancel and migrate_fd_cancel There's no need for two separate functions and this _fd_ is a historic artifact that makes little sense nowadays. Reviewed-by: Peter Xu <peterx@redhat.com> Message-ID: <20250213175927.19642-3-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-02-14 15:19:05 -03:00
Fabiano Rosas	a47f0cfba8	migration: Set migration error outside of migrate_cancel There's no point passing the error into migration cancel only for it to call migrate_set_error(). Reviewed-by: Peter Xu <peterx@redhat.com> Message-ID: <20250213175927.19642-2-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-02-14 15:19:05 -03:00
Stefan Hajnoczi	f2ec48fefd	Block layer patches - Managing inactive nodes (enables QSD migration with shared storage) - Fix swapped values for BLOCK_IO_ERROR 'device' and 'qom-path' - vpc: Read images exported from Azure correctly - scripts/qemu-gdb: Support coroutine dumps in coredumps - Minor cleanups -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmek34IRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9bDpxAAnTvwmdazAXG0g9GzqvrEB/+6rStjAsqE 9MTWV4WxyN41d0RXxN8CYKb8CXSiTRyw6r3CSGNYEI2eShe9e934PriSkZm41HyX n9Yh5YxqGZqitzvPtx62Ii/1KG+PcjQbfHuK1p4+rlKa0yQ2eGlio1JIIrZrCkBZ ikZcQUrhIyD0XV8hTQ2+Ysa+ZN6itjnlTQIG3gS3m8f8WR7kyUXD8YFMQFJFyjVx NrAIpLnc/ln9+5PZR9tje8U7XEn2KCgI5pgGaQnrd0h0G1H4ig8ogzYYnKTLhjU/ AmQpS8np8Tyg6S1UZTiekEq0VuAhThEQc5b3sGbmHWH/R2ABMStyf18oCBAkPzZ7 s6h+3XzTKKY2Q5Q3ZG/ANkUJjTNBhdj1fcaARvbSWsqsuk5CWX/I3jzvgihFtCSs eGu+b/bLeW6P7hu4qPHBcgLHuB1Fc7Rd2t4BoIGM1wcO2CeC9DzUKOiIMZOEJIh0 GGqCkEWDHgckDTakD4/vSqm0UDKt6FSlQC9ga/ILBY3IB5HpHoArY58selymy28i X7MgAvbjdsmNuUuXDZZOiObcFt3j8jlmwPJpPyzXPQIiPX1RXeBPRhVAEeZCKn6Z tfHr72SJdMeVOGXVTvOrJ2iW+4g03rPdmkDFCUhpOwo62RODq7ahvCIXsNf3nEFR rSB3T1M/8EM= =iQLP -----END PGP SIGNATURE----- Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging Block layer patches - Managing inactive nodes (enables QSD migration with shared storage) - Fix swapped values for BLOCK_IO_ERROR 'device' and 'qom-path' - vpc: Read images exported from Azure correctly - scripts/qemu-gdb: Support coroutine dumps in coredumps - Minor cleanups # -----BEGIN PGP SIGNATURE----- # # iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmek34IRHGt3b2xmQHJl # ZGhhdC5jb20ACgkQfwmycsiPL9bDpxAAnTvwmdazAXG0g9GzqvrEB/+6rStjAsqE # 9MTWV4WxyN41d0RXxN8CYKb8CXSiTRyw6r3CSGNYEI2eShe9e934PriSkZm41HyX # n9Yh5YxqGZqitzvPtx62Ii/1KG+PcjQbfHuK1p4+rlKa0yQ2eGlio1JIIrZrCkBZ # ikZcQUrhIyD0XV8hTQ2+Ysa+ZN6itjnlTQIG3gS3m8f8WR7kyUXD8YFMQFJFyjVx # NrAIpLnc/ln9+5PZR9tje8U7XEn2KCgI5pgGaQnrd0h0G1H4ig8ogzYYnKTLhjU/ # AmQpS8np8Tyg6S1UZTiekEq0VuAhThEQc5b3sGbmHWH/R2ABMStyf18oCBAkPzZ7 # s6h+3XzTKKY2Q5Q3ZG/ANkUJjTNBhdj1fcaARvbSWsqsuk5CWX/I3jzvgihFtCSs # eGu+b/bLeW6P7hu4qPHBcgLHuB1Fc7Rd2t4BoIGM1wcO2CeC9DzUKOiIMZOEJIh0 # GGqCkEWDHgckDTakD4/vSqm0UDKt6FSlQC9ga/ILBY3IB5HpHoArY58selymy28i # X7MgAvbjdsmNuUuXDZZOiObcFt3j8jlmwPJpPyzXPQIiPX1RXeBPRhVAEeZCKn6Z # tfHr72SJdMeVOGXVTvOrJ2iW+4g03rPdmkDFCUhpOwo62RODq7ahvCIXsNf3nEFR # rSB3T1M/8EM= # =iQLP # -----END PGP SIGNATURE----- # gpg: Signature made Thu 06 Feb 2025 11:12:50 EST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (25 commits) block: remove unused BLOCK_OP_TYPE_DATAPLANE iotests: Add (NBD-based) tests for inactive nodes iotests: Add qsd-migrate case iotests: Add filter_qtest() nbd/server: Support inactive nodes block/export: Add option to allow export of inactive nodes block: Drain nodes before inactivating them block/export: Don't ignore image activation error in blk_exp_add() block: Support inactive nodes in blk_insert_bs() block: Add blockdev-set-active QMP command block: Add option to create inactive nodes block: Fix crash on block_resize on inactive node block: Don't attach inactive child to active node migration/block-active: Remove global active flag block: Inactivate external snapshot overlays when necessary block: Allow inactivating already inactive nodes block: Add 'active' field to BlockDeviceInfo block-backend: Fix argument order when calling 'qapi_event_send_block_io_error()' scripts/qemu-gdb: Support coroutine dumps in coredumps scripts/qemu-gdb: Simplify fs_base fetching for coroutines ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2025-02-10 13:25:36 -05:00
Daniel P. Berrangé	407bc4bf90	qapi: Move include/qapi/qmp/ to include/qobject/ The general expectation is that header files should follow the same file/path naming scheme as the corresponding source file. There are various historical exceptions to this practice in QEMU, with one of the most notable being the include/qapi/qmp/ directory. Most of the headers there correspond to source files in qobject/. This patch corrects most of that inconsistency by creating include/qobject/ and moving the headers for qobject/ there. This also fixes MAINTAINERS for include/qapi/qmp/dispatch.h: scripts/get_maintainer.pl now reports "QAPI" instead of "No maintainers found". Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> #s390x Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20241118151235.2665921-2-armbru@redhat.com> [Rebased]	2025-02-10 15:33:16 +01:00
Kevin Wolf	c2a189976e	migration/block-active: Remove global active flag Block devices have an individual active state, a single global flag can't cover this correctly. This becomes more important as we allow users to manually manage which nodes are active or inactive. Now that it's allowed to call bdrv_inactivate_all() even when some nodes are already inactive, we can remove the flag and just unconditionally call bdrv_inactivate_all() and, more importantly, bdrv_activate_all() before we make use of the nodes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20250204211407.381505-5-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2025-02-06 14:26:51 +01:00
Peter Xu	3dde8fdbad	migration: Merge precopy/postcopy on switchover start Now after all the cleanups, finally we can merge the switchover startup phase into one single function for precopy/postcopy. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-16-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:41 -03:00
Peter Xu	4881411136	migration: Always set DEVICE state DEVICE state was introduced back in 2017: https://lore.kernel.org/qemu-devel/20171020090556.18631-1-dgilbert@redhat.com/ Quote from Dave's cover letter, when the pre-switchover phase was enabled, the state transition looks like this: The precopy flow is: active->pre-switchover->device->completed The postcopy flow is: active->pre-switchover->postcopy-active->completed To supplement above, when the cap is not enabled: The precopy flow is: active->completed The postcopy flow is: active->postcopy-active->completed It works for us, though we have some code just to special case these state transitions, so the DEVICE state currently is special only to precopy, and only conditionally. I had a quick discussion with Libvirt developers, it turns out that this may not be necessary. IOW, it seems okay we can have DEVICE state to be generic, so that we don't have over-complicated state machines. It not only helps align all the migration state machine, help cleanup the code path especially on pre-switchover handling (see the patch itself), another side benefit is we can unconditionally have a specific state to mark the switchover phase, which might be helpful for debugging too. This patch makes the DEVICE state to be present always, marking that source QEMU is switching over. Then the state machine will be always as simple as: active-> [pre-switchover->] -> device -> [postcopy-active->] -> complete After the change, no matter whether pre-switchover or postcopy is enabled or not, we always have DEVICE state showing the switchover phase. When pre-switchover enabled, we'll have an extra stage before that. When postcopy is enabled, we'll have an extra stage after that. A few qtests need touch up in QEMU tree for this change: - A few iotest outputs (194, 203, 234, 262, 280) - Teach libqos's migrate() on "device" state Cc: Jiri Denemark <jdenemar@redhat.com> Cc: Daniel P. Berrangé <berrange@redhat.com> Cc: Dr. David Alan Gilbert <dave@treblig.org> Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-15-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:41 -03:00
Peter Xu	15c2ffa0b7	migration: Unwrap qemu_savevm_state_complete_precopy() in postcopy Postcopy invokes qemu_savevm_state_complete_precopy() twice for a long time, and that caused way too much confusions. Let's clean this up and make postcopy easier to read. It's actually fairly straightforward: postcopy starts with saving non-postcopiable iterables, then later it saves again with non-iterable only. Move these two calls out makes everything much easier to follow. Otherwise it's very unclear what qemu_savevm_state_complete_precopy() did in either of the calls. No functional change intended. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-13-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:41 -03:00
Peter Xu	46b0155ecf	migration: Notify COMPLETE once for postcopy Postcopy invokes qemu_savevm_state_complete_precopy() twice, that means it'll invoke COMPLETE notify twice.. also twice the tracepoints that marking precopy complete. Move that notification (along with the tracepoint) out to the caller, so that postcopy will only notify once right at the start of switchover phase from precopy. When at it, rename it to suite the file now it locates. For precopy, there should have no functional change except the tracepoint has a name change. For the other two users of qemu_savevm_state_complete_precopy(), namely: qemu_savevm_state() and qemu_savevm_live_state(): the notifier shouldn't matter because they're not precopy at all. Now in these two contexts (aka, "savevm", and "colo") sometimes the precopy notifiers will still be invoked, but that's outside the scope of this patch. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-12-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:41 -03:00
Peter Xu	a880ddd8ce	migration: Take BQL slightly longer in postcopy_start() This paves way for some follow up patch to modify migration states at the end of postcopy_start(), which should better be with the BQL so that there's no way of concurrent cancellation. So we'll do something slightly more with BQL but they're really trivial, hopefully nothing will really chance with this. A side benefit is we can drop another explicit lock() in failure path. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-11-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:41 -03:00
Peter Xu	ec611bd731	migration: Drop cached migration state in migration_maybe_pause() I can't see why we must cache the state now after we avoided possible CANCEL race: that's the only thing I can think of that can modify the migration state concurrently with the migration thread itself. Make all the state updates to happen always, then we don't need to cache the state anymore. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-10-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:41 -03:00
Peter Xu	1f9b657cae	migration: Adjust locking in migration_maybe_pause() In migration_maybe_pause() QEMU may yield BQL before waiting for a semaphore. However it yields the BQL too early, which logically gives it chance for the main thread to quickly take the BQL and modify the state to CANCELLING. To avoid such race condition from happening at all, always update the migration states within the BQL. It'll make sure no concurrent cancellation can ever happen. With that, IIUC there's chance we can remove the extra parameter in migration_maybe_pause() to update active state, but that'll be done separately later. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-9-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:40 -03:00
Peter Xu	40004007e6	migration: Adjust postcopy bandwidth during switchover Precopy uses unlimited bandwidth always during switchover, it makes sense because this is so critical and no one would like to throttle bandwidth during the VM blackout. OTOH, postcopy surprisingly didn't do that. There's one line that in the middle of the postcopy switchover it tries to switch to postcopy's specified max-postcopy-bandwidth, but even so it's somewhere in the middle which is strange. This patch brings the two modes to always use unlimited bandwidth for switchover, meanwhile only apply the postcopy max bandwidth after the switchover is completed. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-8-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:40 -03:00
Peter Xu	89011a702f	migration: Synchronize all CPU states only for non-iterable dump Do one shot cpu sync at qemu_savevm_state_complete_precopy_non_iterable(), instead of coding it separately in two places. Note that in the context of qemu_savevm_state_complete_precopy(), this patch is also an optimization for postcopy path, in that we can avoid sync cpu twice during switchover: before this patch, postcopy_start() invokes twice on qemu_savevm_state_complete_precopy(), each of them will try to sync CPU info. In reality, only one of them would be enough. For background snapshot, there's no intended functional change. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-7-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:40 -03:00
Peter Xu	4822128693	migration: Drop inactivate_disk param in qemu_savevm_state_complete* This parameter is only used by one caller, which is the genuine precopy complete path (migration_completion_precopy). The parameter was introduced in `a1fbe750fd` ("migration: Fix race of image locking between src and dst") to make sure the inactivate will happen before EOF to make sure dest will always be able to activate the disk properly. However there's no limitation on how early we inactivate the disk. For precopy completion path, we can always do that as long as VM is stopped. Move the disk inactivate there, then we can remove this inactivate_disk parameter in the whole call stack, because all the rest users pass in false always. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-6-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:40 -03:00
Peter Xu	812145fcf7	migration: Avoid two src-downtime-end tracepoints for postcopy Postcopy can trigger this tracepoint twice, while only the 1st one is valid. Avoid triggering the 2nd tracepoint just like what we do with recording the total downtime. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-5-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>	2025-01-29 11:56:40 -03:00

1 2 3 4 5 ...

832 commits