qemu-cr16

Author	SHA1	Message	Date
Markus Armbruster	88be119fb1	kvm: Fix kvm_vm_ioctl() and kvm_device_ioctl() return value These functions wrap ioctl(). When ioctl() fails, it sets @errno. The wrappers then return that @errno negated. Except they call accel_ioctl_end() between calling ioctl() and reading @errno. accel_ioctl_end() can clobber @errno, e.g. when a futex() system call fails. Seems unlikely, but it's a bug all the same. Fix by retrieving @errno before calling accel_ioctl_end(). Fixes: `a27dd2de68` (KVM: keep track of running ioctls) Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20251128152050.3417834-1-armbru@redhat.com>	2025-12-02 07:46:21 +01:00
Zhenzhong Duan	725ec89803	accel/kvm: Fix an erroneous check on coalesced_mmio_ring According to KVM uAPI, coalesced mmio page is KVM_COALESCED_MMIO_PAGE_OFFSET offset from kvm_run pages. For x86 it's 2 pages offset, for arm it's 1 page offset currently. We shouldn't presume it's hardcoded 1 page or else coalesced_mmio_ring will not be cleared in do_kvm_destroy_vcpu() in x86. Fixes: `7ed0919119` ("migration: close kvm after cpr") Cc: Markus Armbruster <armbru@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/qemu-devel/20250928085432.40107-6-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-10-22 08:12:52 +02:00
Philippe Mathieu-Daudé	0de36ecb2b	accel/kvm: Factor kvm_cpu_synchronize_put() out The same code is duplicated 3 times: factor a common method. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Link: https://lore.kernel.org/r/20251008040715.81513-4-philmd@linaro.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-10-14 11:03:59 +02:00
Philippe Mathieu-Daudé	4db362f68c	system/physmem: Extract API out of 'system/ram_addr.h' header Very few files use the Physical Memory API. Declare its methods in their own header: "system/physmem.h". Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-Id: <20251001175448.18933-19-philmd@linaro.org>	2025-10-07 05:03:56 +02:00
Philippe Mathieu-Daudé	aa60bdb700	system/physmem: Drop 'cpu_' prefix in Physical Memory API The functions related to the Physical Memory API declared in "system/ram_addr.h" do not operate on vCPU. Remove the 'cpu_' prefix. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-Id: <20251001175448.18933-18-philmd@linaro.org>	2025-10-07 05:03:56 +02:00
Philippe Mathieu-Daudé	a55e8ffe47	accel/kvm: Include missing 'exec/target_page.h' header The "exec/target_page.h" header is indirectly pulled from "system/ram_addr.h". Include it explicitly, in order to avoid unrelated issues when refactoring "system/ram_addr.h": accel/kvm/kvm-all.c: In function ‘kvm_init’: accel/kvm/kvm-all.c:2636:12: error: ‘TARGET_PAGE_SIZE’ undeclared (first use in this function); did you mean ‘TARGET_PAGE_BITS’? 2636 \| assert(TARGET_PAGE_SIZE <= qemu_real_host_page_size()); \| ^~~~~~~~~~~~~~~~ Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20251001175448.18933-3-philmd@linaro.org>	2025-10-07 05:03:56 +02:00
Philippe Mathieu-Daudé	8fe6ce4019	system/ramblock: Move ram_block_discard_*_range() declarations Keep RAM blocks API in the same header: "system/ramblock.h". Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: Peter Xu <peterx@redhat.com> Message-Id: <20251002032812.26069-4-philmd@linaro.org>	2025-10-07 03:37:03 +02:00
Xiaoyao Li	00c0911c68	accel/kvm: Set guest_memfd_offset to non-zero value only when guest_memfd is valid Current QEMU unconditionally sets the guest_memfd_offset of KVMSlot in kvm_set_phys_mem(), which leads to the trace of kvm_set_user_memory looks: kvm_set_user_memory AddrSpace#0 Slot#4 flags=0x2 gpa=0xe0000 size=0x20000 ua=0x7f5840de0000 guest_memfd=-1 guest_memfd_offset=0x3e0000 ret=0 It's confusing that the guest_memfd_offset has a non-zero value while the guest_memfd is invalid (-1). Change to only set guest_memfd_offset when guest_memfd is valid and leave it as 0 when no valid guest_memfd. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250728115707.1374614-4-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-09-17 19:01:57 +02:00
Xiaoyao Li	80030f66ad	accel/kvm: Zero out mem explicitly in kvm_set_user_memory_region() Zero out the entire mem explicitly before it's used, to ensure the unused feilds (pad1, pad2) are all zeros. Otherwise, it might cause problem when the pad fields are extended by future KVM. Fixes: `ce5a983233` ("kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot") Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20250728115707.1374614-3-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-09-17 19:01:57 +02:00
Xiaoyao Li	706cc70865	accel/kvm: Switch to check KVM_CAP_GUEST_MEMFD and KVM_CAP_USER_MEMORY2 on VM It returns more accruate result on checking KVM_CAP_GUEST_MEMFD and KVM_CAP_USER_MEMORY2 on VM instance instead of on KVM platform. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250728115707.1374614-2-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-09-17 19:01:57 +02:00
Paolo Bonzini	9a191d3782	cpus: clear exit_request in qemu_process_cpu_events Make the code common to all accelerators: after seeing cpu->exit_request set to true, accelerator code needs to reach qemu_process_cpu_events_common(). So for the common cases where they use qemu_process_cpu_events(), go ahead and clear it in there. Note that the cheap qatomic_set() is enough because at this point the thread has taken the BQL; qatomic_set_mb() is not needed. In particular, this is the ordering of the communication between I/O and vCPU threads is always the same. In the I/O thread: (a) store other memory locations that will be checked if cpu->exit_request or cpu->interrupt_request is 1 (for example cpu->stop or cpu->work_list for cpu->exit_request) (b) cpu_exit(): store-release cpu->exit_request, or (b) cpu_interrupt(): store-release cpu->interrupt_request >>> at this point, cpu->halt_cond is broadcast and the BQL released (c) do the accelerator-specific kick (e.g. write icount_decr for TCG, pthread_kill for KVM, etc.) In the vCPU thread instead the opposite order is respected: (c) the accelerator's execution loop exits thanks to the kick (b) then the inner execution loop checks cpu->interrupt_request and cpu->exit_request. If needed cpu->interrupt_request is converted into cpu->exit_request when work is needed outside the execution loop. (a) then the other memory locations are checked. Some may need to be read under the BQL, but the vCPU thread may also take other locks (e.g. for queued work items) or none at all. qatomic_set_mb() would only be needed if the halt sleep was done outside the BQL (though in that case, cpu->exit_request probably would be replaced by a QemuEvent or something like that). Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-09-17 19:00:56 +02:00
Paolo Bonzini	f084ff128b	accel: use atomic accesses for exit_request CPU threads write exit_request as a "note to self" that they need to go out to a slow path. This write happens out of the BQL and can be a data race with another threads' cpu_exit(); use atomic accesses consistently. While at it, change the source argument from int ("1") to bool ("true"). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-09-17 19:00:55 +02:00
Paolo Bonzini	ac6c8a390b	accel: use store_release/load_acquire for cross-thread exit_request Reads and writes cpu->exit_request do not use a load-acquire/store-release pair right now, but this means that cpu_exit() may not write cpu->exit_request after any flags that are read by the vCPU thread. Probably everything is protected one way or the other by the BQL, because cpu->exit_request leads to the slow path, where the CPU thread often takes the BQL (for example, to go to sleep by waiting on the BQL-protected cpu->halt_cond); but it's not clear, so use load-acquire/store-release consistently. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-09-17 19:00:55 +02:00
Ani Sinha	8e98961f6e	kvm/kvm-all: make kvm_park/unpark_vcpu local to kvm-all.c kvm_park_vcpu() and kvm_unpark_vcpu() is only used in kvm-all.c. Declare it static, remove it from common header file and make it local to kvm-all.c Signed-off-by: Ani Sinha <anisinha@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250815065445.8978-1-anisinha@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-08-27 10:57:03 +02:00
Stefan Hajnoczi	f96b157ebb	Accelerators patches - Unify x86/arm hw/xen/arch_hvm.h header - Move non-system-specific 'accel/accel-ops.h' and 'accel-cpu-ops.h' to accel/ - Move KVM definitions qapi/accelerator.json - Add @qom-type field to CpuInfoFast QAPI structure - Display CPU model name in 'info cpus' HMP command - Introduce @x-accel-stats QMP command - Add 'info accel' on HMP - Improve qemu_add_vm_change_state_handler() docstring - Extract TCG statistic related code to tcg-stats.c - Implement AccelClass::get_[vcpu]_stats() handlers for TCG and HVF - Do not dump NaN in TCG statistics - Revert incomplete "accel/tcg: Unregister the RCU before exiting RR thread" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmh2r4UACgkQ4+MsLN6t wN5i6xAAkOvwFh1GmsPUdz5RxzsWoIUDvyENg6E8Axwe5tSEMRFiPjabbTQJomQg GZt75XIS24LZFZ+hvqrLSA+dFgXTgWv08ZE81EjwjmAMBlLCOPhCgeN6C1p8100Y scSvRJbP9k9lpA5K7et/1X4AkK2cZyh+LGJgCjr2Al2mbERpPueDF8fxqeohFvXQ nTSks4XlA0yQ06+9r49aQAiuXvgg9lDT1wIglD2HEV7vOVs/ud+yyL8+z5YMeFzx pSIc6wDu4PqdA46w4MZs90uTy7S/PMvBiYDEiV3tKzg0MLttvFGlT58/YjVtguTP mNkfwIEwQtDQzoxsFIJO7yBTlTRBs95V4aIVk3pB+Gb/bideRPIkeVQvgMSEBKj7 N0pEXWOxfB9iIWO6b1utYpQ4uxeDOU/8DPUCit1IBbNgKTaJkJb77fboYk7NaB0K KEtObAk6jMatB/xr+vUFWc4sMk9wlm72w8wcQzgKZ0xV2U3d1/Y/9nS4GvI510ev TRQ3mKj7N319uCeId1czF6W8rillCJ2u8ZK53u+Nfp7R3PbsRSMc6IDJ1UdDUlyR HFcWHxbcbEGhe8SnFGab4Qd6fWChcn2EaEoAJJz+Rqv0k3zcwqccNM5waCABAjTE 0S22JIHePJKcpkMLGq3EOUAQuu+8Zsol7gPCLxSAMclVqPTl9ck= =rAav -----END PGP SIGNATURE----- Merge tag 'accel-20250715' of https://github.com/philmd/qemu into staging Accelerators patches - Unify x86/arm hw/xen/arch_hvm.h header - Move non-system-specific 'accel/accel-ops.h' and 'accel-cpu-ops.h' to accel/ - Move KVM definitions qapi/accelerator.json - Add @qom-type field to CpuInfoFast QAPI structure - Display CPU model name in 'info cpus' HMP command - Introduce @x-accel-stats QMP command - Add 'info accel' on HMP - Improve qemu_add_vm_change_state_handler() docstring - Extract TCG statistic related code to tcg-stats.c - Implement AccelClass::get_[vcpu]_stats() handlers for TCG and HVF - Do not dump NaN in TCG statistics - Revert incomplete "accel/tcg: Unregister the RCU before exiting RR thread" # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmh2r4UACgkQ4+MsLN6t # wN5i6xAAkOvwFh1GmsPUdz5RxzsWoIUDvyENg6E8Axwe5tSEMRFiPjabbTQJomQg # GZt75XIS24LZFZ+hvqrLSA+dFgXTgWv08ZE81EjwjmAMBlLCOPhCgeN6C1p8100Y # scSvRJbP9k9lpA5K7et/1X4AkK2cZyh+LGJgCjr2Al2mbERpPueDF8fxqeohFvXQ # nTSks4XlA0yQ06+9r49aQAiuXvgg9lDT1wIglD2HEV7vOVs/ud+yyL8+z5YMeFzx # pSIc6wDu4PqdA46w4MZs90uTy7S/PMvBiYDEiV3tKzg0MLttvFGlT58/YjVtguTP # mNkfwIEwQtDQzoxsFIJO7yBTlTRBs95V4aIVk3pB+Gb/bideRPIkeVQvgMSEBKj7 # N0pEXWOxfB9iIWO6b1utYpQ4uxeDOU/8DPUCit1IBbNgKTaJkJb77fboYk7NaB0K # KEtObAk6jMatB/xr+vUFWc4sMk9wlm72w8wcQzgKZ0xV2U3d1/Y/9nS4GvI510ev # TRQ3mKj7N319uCeId1czF6W8rillCJ2u8ZK53u+Nfp7R3PbsRSMc6IDJ1UdDUlyR # HFcWHxbcbEGhe8SnFGab4Qd6fWChcn2EaEoAJJz+Rqv0k3zcwqccNM5waCABAjTE # 0S22JIHePJKcpkMLGq3EOUAQuu+8Zsol7gPCLxSAMclVqPTl9ck= # =rAav # -----END PGP SIGNATURE----- # gpg: Signature made Tue 15 Jul 2025 15:44:05 EDT # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full] # Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE * tag 'accel-20250715' of https://github.com/philmd/qemu: system/runstate: Document qemu_add_vm_change_state_handler_prio* in hdr system/runstate: Document qemu_add_vm_change_state_handler() accel/hvf: Implement AccelClass::get_vcpu_stats() handler accel/tcg: Implement AccelClass::get_stats() handler accel/tcg: Propagate AccelState to dump_accel_info() accel/system: Add 'info accel' on human monitor accel/system: Introduce @x-accel-stats QMP command accel/tcg: Extract statistic related code to tcg-stats.c Revert "accel/tcg: Unregister the RCU before exiting RR thread" accel: Extract AccelClass definition to 'accel/accel-ops.h' accel: Rename 'system/accel-ops.h' -> 'accel/accel-cpu-ops.h' accel/tcg: Do not dump NaN statistics hw/core/machine: Display CPU model name in 'info cpus' command qapi/machine: Add @qom-type field to CpuInfoFast structure qapi/accel: Move definitions related to accelerators in their own file hw/arm/xen-pvh: Remove unnecessary 'hw/xen/arch_hvm.h' header hw/xen/arch_hvm: Unify x86 and ARM variants Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Conflicts: qapi/machine.json Commit `0462da9d6b` ("qapi: remove trivial "Returns:" sections") removed trivial "Returns:". This caused a conflict with the move from machine.json to accelerator.json.	2025-07-16 07:13:40 -04:00
Philippe Mathieu-Daudé	f7a7e7dd21	accel: Extract AccelClass definition to 'accel/accel-ops.h' Only accelerator implementations (and the common accelator code) need to know about AccelClass internals. Move the definition out but forward declare AccelState and AccelClass. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250703173248.44995-39-philmd@linaro.org>	2025-07-15 19:34:33 +02:00
Thomas Huth	f180e367fc	accel/kvm: Adjust the note about the minimum required kernel version Since commit `126e7f7803` ("kvm: require KVM_CAP_IOEVENTFD and KVM_CAP_IOEVENTFD_ANY_LENGTH") we require at least kernel 4.5 to be able to use KVM. Adjust the upgrade_note accordingly. While we're at it, remove the text about kvm-kmod and the SourceForge URL since this is not actively maintained anymore. Fixes: `126e7f7803` ("kvm: require KVM_CAP_IOEVENTFD and KVM_CAP_IOEVENTFD_ANY_LENGTH") Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2025-07-13 12:08:07 +03:00
Stefan Hajnoczi	989dd906ed	Accelerators patches - Generic API consolidation, cleanups (dead code removal, documentation added) - Remove monitor TCG 'info opcount' and @x-query-opcount - Have HVF / NVMM / WHPX use generic CPUState::vcpu_dirty field - Expose nvmm_enabled() and whpx_enabled() to common code - Have hmp_info_registers() dump vector registers -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmhnql4ACgkQ4+MsLN6t wN6Lfg//R4h6dyAg02hyopwb/DSI97hAsD9kap15ro1qszYrIOkJcEPoE37HDi6d O0Ls+8NPpJcnMwdghHvVaRGoIH2OY5ogXKo6UK1BbOn8iAGxRrT/IPVCyFbPmQoe Bk78Z/wne/YgCXiW4HGHSJO5sL04AQqcFYnwjisHHf3Ox8RR85LbhWqthZluta4i a/Y8W5UO7jfwhAl1/Zb2cU+Rv75I6xcaLQAfmbt4j+wHP52I2cjLpIYo4sCn+ULJ AVX4q4MKrkDrr6CYPXxdGJzYEzVn9evynVcQoRzL6bLZFMpa284AzVd3kQg9NWAb p1hvKJTA57q4XDoD50qVGLhP207VVSUcdm0r2ZJA2jag5ddoT+x2talz8/f6In1b 7BrSM/pla8x9KvTne/ko0wSL0o2dOWyig8mBxARLZWPxk+LBVs1PBZfvn+3j1pYA rWV25Ht4QJlUYMbe3NvEIomsVThKg8Fh3b4mEuyPM+LZ1brgmhrzJG1SF+G4fH8A aig/RVqgNHtajSnG4A723k2/QzlvnAiT7E3dKB5FogjTcVzFRaWFKsUb4ORqsCAz c/AheCJY4PP3pAnb0ODISSVviXwAXqCLbtZhDGhHNYl3C69EyGPPMiVxCaIxKDxU bF7AIYhRTTMyNSbnkcRS3UDO/gZS7x5/K+/YAM9akQEYADIodYM= =Vb39 -----END PGP SIGNATURE----- Merge tag 'accel-20250704' of https://github.com/philmd/qemu into staging Accelerators patches - Generic API consolidation, cleanups (dead code removal, documentation added) - Remove monitor TCG 'info opcount' and @x-query-opcount - Have HVF / NVMM / WHPX use generic CPUState::vcpu_dirty field - Expose nvmm_enabled() and whpx_enabled() to common code - Have hmp_info_registers() dump vector registers # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmhnql4ACgkQ4+MsLN6t # wN6Lfg//R4h6dyAg02hyopwb/DSI97hAsD9kap15ro1qszYrIOkJcEPoE37HDi6d # O0Ls+8NPpJcnMwdghHvVaRGoIH2OY5ogXKo6UK1BbOn8iAGxRrT/IPVCyFbPmQoe # Bk78Z/wne/YgCXiW4HGHSJO5sL04AQqcFYnwjisHHf3Ox8RR85LbhWqthZluta4i # a/Y8W5UO7jfwhAl1/Zb2cU+Rv75I6xcaLQAfmbt4j+wHP52I2cjLpIYo4sCn+ULJ # AVX4q4MKrkDrr6CYPXxdGJzYEzVn9evynVcQoRzL6bLZFMpa284AzVd3kQg9NWAb # p1hvKJTA57q4XDoD50qVGLhP207VVSUcdm0r2ZJA2jag5ddoT+x2talz8/f6In1b # 7BrSM/pla8x9KvTne/ko0wSL0o2dOWyig8mBxARLZWPxk+LBVs1PBZfvn+3j1pYA # rWV25Ht4QJlUYMbe3NvEIomsVThKg8Fh3b4mEuyPM+LZ1brgmhrzJG1SF+G4fH8A # aig/RVqgNHtajSnG4A723k2/QzlvnAiT7E3dKB5FogjTcVzFRaWFKsUb4ORqsCAz # c/AheCJY4PP3pAnb0ODISSVviXwAXqCLbtZhDGhHNYl3C69EyGPPMiVxCaIxKDxU # bF7AIYhRTTMyNSbnkcRS3UDO/gZS7x5/K+/YAM9akQEYADIodYM= # =Vb39 # -----END PGP SIGNATURE----- # gpg: Signature made Fri 04 Jul 2025 06:18:06 EDT # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full] # Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE * tag 'accel-20250704' of https://github.com/philmd/qemu: (31 commits) MAINTAINERS: Add me as reviewer of overall accelerators section monitor/hmp-cmds-target: add CPU_DUMP_VPU in hmp_info_registers() accel: Pass AccelState argument to gdbstub_supported_sstep_flags() accel: Remove unused MachineState argument of AccelClass::setup_post() accel: Directly pass AccelState argument to AccelClass::has_memory() accel/kvm: Directly pass KVMState argument to do_kvm_create_vm() accel/kvm: Prefer local AccelState over global MachineState::accel accel/tcg: Prefer local AccelState over global current_accel() accel: Propagate AccelState to AccelClass::init_machine() accel: Keep reference to AccelOpsClass in AccelClass accel: Expose and register generic_handle_interrupt() accel/dummy: Extract 'dummy-cpus.h' header from 'system/cpus.h' accel/whpx: Expose whpx_enabled() to common code accel/nvmm: Expose nvmm_enabled() to common code accel/system: Document cpu_synchronize_state_post_init/reset() accel/system: Document cpu_synchronize_state() accel/kvm: Remove kvm_cpu_synchronize_state() stub accel/whpx: Replace @dirty field by generic CPUState::vcpu_dirty field accel/nvmm: Replace @dirty field by generic CPUState::vcpu_dirty field accel/hvf: Replace @dirty field by generic CPUState::vcpu_dirty field ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2025-07-04 08:58:49 -04:00
Philippe Mathieu-Daudé	842e7eecd4	accel: Pass AccelState argument to gdbstub_supported_sstep_flags() In order to have AccelClass methods instrospect their state, we need to pass AccelState by argument. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250703173248.44995-37-philmd@linaro.org>	2025-07-04 12:08:44 +02:00
Philippe Mathieu-Daudé	8dd5e6befc	accel: Directly pass AccelState argument to AccelClass::has_memory() Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20250703173248.44995-34-philmd@linaro.org>	2025-07-04 12:08:44 +02:00
Philippe Mathieu-Daudé	583d1c8f16	accel/kvm: Directly pass KVMState argument to do_kvm_create_vm() Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250703173248.44995-35-philmd@linaro.org>	2025-07-04 12:08:44 +02:00
Philippe Mathieu-Daudé	f0db25adcf	accel/kvm: Prefer local AccelState over global MachineState::accel Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250703173248.44995-32-philmd@linaro.org>	2025-07-04 12:08:44 +02:00
Philippe Mathieu-Daudé	51e1896199	accel: Propagate AccelState to AccelClass::init_machine() In order to avoid init_machine() to call current_accel(), pass AccelState along. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20250703173248.44995-31-philmd@linaro.org>	2025-07-04 12:08:44 +02:00
Philippe Mathieu-Daudé	06810394fd	accel/kvm: Reduce kvm_create_vcpu() declaration scope kvm_create_vcpu() is only used within the same file unit. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20250703173248.44995-7-philmd@linaro.org>	2025-07-04 12:02:52 +02:00
Steve Sistare	7ed0919119	migration: close kvm after cpr cpr-transfer breaks vfio network connectivity to and from the guest, and the host system log shows: irq bypass consumer (token 00000000a03c32e5) registration fails: -16 which is EBUSY. This occurs because KVM descriptors are still open in the old QEMU process. Close them. Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/qemu-devel/1751493538-202042-4-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>	2025-07-03 13:42:28 +02:00
Chenyi Qiang	2fde3fb916	physmem: Support coordinated discarding of RAM with guest_memfd A new field, attributes, was introduced in RAMBlock to link to a RamBlockAttributes object, which centralizes all guest_memfd related information (such as fd and status bitmap) within a RAMBlock. Create and initialize the RamBlockAttributes object upon ram_block_add(). Meanwhile, register the object in the target RAMBlock's MemoryRegion. After that, guest_memfd-backed RAMBlock is associated with the RamDiscardManager interface, and the users can execute RamDiscardManager specific handling. For example, VFIO will register the RamDiscardListener and get notifications when the state_change() helper invokes. As coordinate discarding of RAM with guest_memfd is now supported, only block uncoordinated discard. Tested-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Alexey Kardashevskiy <aik@amd.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Link: https://lore.kernel.org/r/20250612082747.51539-6-chenyi.qiang@intel.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-06-23 16:03:59 -04:00
Tom Lendacky	4cdc489eb9	i386/kvm: Prefault memory on page state change A page state change is typically followed by an access of the page(s) and results in another VMEXIT in order to map the page into the nested page table. Depending on the size of page state change request, this can generate a number of additional VMEXITs. For example, under SNP, when Linux is utilizing lazy memory acceptance, memory is typically accepted in 4M chunks. A page state change request is submitted to mark the pages as private, followed by validation of the memory. Since the guest_memfd currently only supports 4K pages, each page validation will result in VMEXIT to map the page, resulting in 1024 additional exits. When performing a page state change, invoke KVM_PRE_FAULT_MEMORY for the size of the page state change in order to pre-map the pages and avoid the additional VMEXITs. This helps speed up boot times. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/f5411c42340bd2f5c14972551edb4e959995e42b.1743193824.git.thomas.lendacky@amd.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-06-06 14:32:54 +02:00
Xiaoyao Li	b4b7fb5a77	cpu: Don't set vcpu_dirty when guest_state_protected QEMU calls kvm_arch_put_registers() when vcpu_dirty is true in kvm_vcpu_exec(). However, for confidential guest, like TDX, putting registers is disallowed due to guest state is protected. Only set vcpu_dirty to true with guest state is not protected when creating the vcpu. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-43-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-05-28 19:35:54 +02:00
Xiaoyao Li	77b5403a02	kvm: Check KVM_CAP_MAX_VCPUS at vm level KVM with TDX support starts to report different KVM_CAP_MAX_VCPUS per different VM types. So switch to check the KVM_CAP_MAX_VCPUS at vm level. KVM still returns the global KVM_CAP_MAX_VCPUS when the KVM is old that doesn't report different value at vm level. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-31-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-05-28 19:35:54 +02:00
Xiaoyao Li	a668268dc0	kvm: Introduce kvm_arch_pre_create_vcpu() Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent work prior to create any vcpu. This is for i386 TDX because it needs call TDX_INIT_VM before creating any vcpu. The specific implementation for i386 will be added in the future patch. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250508150002.689633-8-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-05-28 19:01:40 +02:00
Philippe Mathieu-Daudé	56f8fb6886	accel/kvm: Use target_needs_bswap() Check whether we need to swap at runtime using target_needs_bswap(). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250417131004.47205-3-philmd@linaro.org>	2025-04-25 17:09:58 +02:00
Philippe Mathieu-Daudé	12d1a768bd	qom: Have class_init() take a const data argument Mechanical change using gsed, then style manually adapted to pass checkpatch.pl script. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250424194905.82506-4-philmd@linaro.org>	2025-04-25 17:00:41 +02:00
Pierrick Bouvier	f1d2a8e953	accel/kvm: move KVM_HAVE_MCE_INJECTION define to kvm-all.c This define is used only in accel/kvm/kvm-all.c, so we push directly the definition there. Add more visibility to kvm_arch_on_sigbus_vcpu() to allow removing this define from any header. The architectures defining KVM_HAVE_MCE_INJECTION are i386, x86_64 and aarch64. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20250325045915.994760-18-pierrick.bouvier@linaro.org>	2025-04-23 15:04:57 -07:00
Richard Henderson	4705a71db5	include/system: Move exec/ram_addr.h to system/ram_addr.h Convert the existing includes with sed. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2025-04-23 14:08:24 -07:00
Richard Henderson	8be545ba5a	include/system: Move exec/memory.h to system/memory.h Convert the existing includes with sed -i ,exec/memory.h,system/memory.h,g Move the include within cpu-all.h into a !CONFIG_USER_ONLY block. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2025-04-23 14:08:21 -07:00
Maciej S. Szmigiero	f6b5f71f04	target/i386: Reset parked vCPUs together with the online ones Commit `3f2a05b31e` ("target/i386: Reset TSCs of parked vCPUs too on VM reset") introduced a way to reset TSCs of parked vCPUs during VM reset to prevent them getting desynchronized with the online vCPUs and therefore causing the KVM PV clock to lose PVCLOCK_TSC_STABLE_BIT. The way this was done was by registering a parked vCPU-specific QEMU reset callback via qemu_register_reset(). However, it turns out that on particularly device-rich VMs QEMU reset callbacks can take a long time to execute (which isn't surprising, considering that they involve resetting all of VM devices). In particular, their total runtime can exceed the 1-second TSC synchronization window introduced in KVM commit 5d3cb0f6a8e3 ("KVM: Improve TSC offset matching"). Since the TSCs of online vCPUs are only reset from "synchronize_post_reset" AccelOps handler (which runs after all qemu_register_reset() handlers) this essentially makes that fix ineffective on these VMs. The easiest way to guarantee that these parked vCPUs are reset at the same time as the online ones (regardless how long it takes for VM devices to reset) is to piggyback on post-reset vCPU synchronization handler for one of online vCPUs - as there is no generic post-reset AccelOps handler that isn't per-vCPU. The first online vCPU was selected for that since it is easily available under "first_cpu" define. This does not create an ordering issue since the order of vCPU TSC resets does not matter. Fixes: `3f2a05b31e` ("target/i386: Reset TSCs of parked vCPUs too on VM reset") Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/r/e8b85a5915f79aa177ca49eccf0e9b534470c1cd.1743099810.git.maciej.szmigiero@oracle.com Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-04-17 18:23:26 +02:00
William Roche	c1cda1c5f8	system/physmem: handle hugetlb correctly in qemu_ram_remap() The list of hwpoison pages used to remap the memory on reset is based on the backend real page size. To correctly handle hugetlb, we must mmap(MAP_FIXED) a complete hugetlb page; hugetlb pages cannot be partially mapped. Signed-off-by: William Roche <william.roche@oracle.com> Co-developed-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20250211212707.302391-2-william.roche@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2025-02-12 11:33:13 -05:00
Stefan Hajnoczi	65cb7129f4	Accel & Exec patch queue - Ignore writes to CNTP_CTL_EL0 on HVF ARM (Alexander) - Add '-d invalid_mem' logging option (Zoltan) - Create QOM containers explicitly (Peter) - Rename sysemu/ -> system/ (Philippe) - Re-orderning of include/exec/ headers (Philippe) Move a lot of declarations from these legacy mixed bag headers: . "exec/cpu-all.h" . "exec/cpu-common.h" . "exec/cpu-defs.h" . "exec/exec-all.h" . "exec/translate-all" to these more specific ones: . "exec/page-protection.h" . "exec/translation-block.h" . "user/cpu_loop.h" . "user/guest-host.h" . "user/page-protection.h" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmdlnyAACgkQ4+MsLN6t wN6mBw//QFWi7CrU+bb8KMM53kOU9C507tjn99LLGFb5or73/umDsw6eo/b8DHBt KIwGLgATel42oojKfNKavtAzLK5rOrywpboPDpa3SNeF1onW+99NGJ52LQUqIX6K A6bS0fPdGG9ZzEuPpbjDXlp++0yhDcdSgZsS42fEsT7Dyj5gzJYlqpqhiXGqpsn8 4Y0UMxSL21K3HEexlzw2hsoOBFA3tUm2ujNDhNkt8QASr85yQVLCypABJnuoe/// 5Ojl5wTBeDwhANET0rhwHK8eIYaNboiM9fHopJYhvyw1bz6yAu9jQwzF/MrL3s/r xa4OBHBy5mq2hQV9Shcl3UfCQdk/vDaYaWpgzJGX8stgMGYfnfej1SIl8haJIfcl VMX8/jEFdYbjhO4AeGRYcBzWjEJymkDJZoiSWp2NuEDi6jqIW+7yW1q0Rnlg9lay ShAqLK5Pv4zUw3t0Jy3qv9KSW8sbs6PQxtzXjk8p97rTf76BJ2pF8sv1tVzmsidP 9L92Hv5O34IqzBu2oATOUZYJk89YGmTIUSLkpT7asJZpBLwNM2qLp5jO00WVU0Sd +kAn324guYPkko/TVnjC/AY7CMu55EOtD9NU35k3mUAnxXT9oDUeL4NlYtfgrJx6 x1Nzr2FkS68+wlPAFKNSSU5lTjsjNaFM0bIJ4LCNtenJVP+SnRo= =cjz8 -----END PGP SIGNATURE----- Merge tag 'exec-20241220' of https://github.com/philmd/qemu into staging Accel & Exec patch queue - Ignore writes to CNTP_CTL_EL0 on HVF ARM (Alexander) - Add '-d invalid_mem' logging option (Zoltan) - Create QOM containers explicitly (Peter) - Rename sysemu/ -> system/ (Philippe) - Re-orderning of include/exec/ headers (Philippe) Move a lot of declarations from these legacy mixed bag headers: . "exec/cpu-all.h" . "exec/cpu-common.h" . "exec/cpu-defs.h" . "exec/exec-all.h" . "exec/translate-all" to these more specific ones: . "exec/page-protection.h" . "exec/translation-block.h" . "user/cpu_loop.h" . "user/guest-host.h" . "user/page-protection.h" # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmdlnyAACgkQ4+MsLN6t # wN6mBw//QFWi7CrU+bb8KMM53kOU9C507tjn99LLGFb5or73/umDsw6eo/b8DHBt # KIwGLgATel42oojKfNKavtAzLK5rOrywpboPDpa3SNeF1onW+99NGJ52LQUqIX6K # A6bS0fPdGG9ZzEuPpbjDXlp++0yhDcdSgZsS42fEsT7Dyj5gzJYlqpqhiXGqpsn8 # 4Y0UMxSL21K3HEexlzw2hsoOBFA3tUm2ujNDhNkt8QASr85yQVLCypABJnuoe/// # 5Ojl5wTBeDwhANET0rhwHK8eIYaNboiM9fHopJYhvyw1bz6yAu9jQwzF/MrL3s/r # xa4OBHBy5mq2hQV9Shcl3UfCQdk/vDaYaWpgzJGX8stgMGYfnfej1SIl8haJIfcl # VMX8/jEFdYbjhO4AeGRYcBzWjEJymkDJZoiSWp2NuEDi6jqIW+7yW1q0Rnlg9lay # ShAqLK5Pv4zUw3t0Jy3qv9KSW8sbs6PQxtzXjk8p97rTf76BJ2pF8sv1tVzmsidP # 9L92Hv5O34IqzBu2oATOUZYJk89YGmTIUSLkpT7asJZpBLwNM2qLp5jO00WVU0Sd # +kAn324guYPkko/TVnjC/AY7CMu55EOtD9NU35k3mUAnxXT9oDUeL4NlYtfgrJx6 # x1Nzr2FkS68+wlPAFKNSSU5lTjsjNaFM0bIJ4LCNtenJVP+SnRo= # =cjz8 # -----END PGP SIGNATURE----- # gpg: Signature made Fri 20 Dec 2024 11:45:20 EST # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE * tag 'exec-20241220' of https://github.com/philmd/qemu: (59 commits) util/qemu-timer: fix indentation meson: Do not define CONFIG_DEVICES on user emulation system/accel-ops: Remove unnecessary 'exec/cpu-common.h' header system/numa: Remove unnecessary 'exec/cpu-common.h' header hw/xen: Remove unnecessary 'exec/cpu-common.h' header target/mips: Drop left-over comment about Jazz machine target/mips: Remove tswap() calls in semihosting uhi_fstat_cb() target/xtensa: Remove tswap() calls in semihosting simcall() helper accel/tcg: Un-inline translator_is_same_page() accel/tcg: Include missing 'exec/translation-block.h' header accel/tcg: Move tcg_cflags_has/set() to 'exec/translation-block.h' accel/tcg: Restrict curr_cflags() declaration to 'internal-common.h' qemu/coroutine: Include missing 'qemu/atomic.h' header exec/translation-block: Include missing 'qemu/atomic.h' header accel/tcg: Declare cpu_loop_exit_requested() in 'exec/cpu-common.h' exec/cpu-all: Include 'cpu.h' earlier so MMU_USER_IDX is always defined target/sparc: Move sparc_restore_state_to_opc() to cpu.c target/sparc: Uninline cpu_get_tb_cpu_state() target/loongarch: Declare loongarch_cpu_dump_state() locally user: Move various declarations out of 'exec/exec-all.h' ... Conflicts: hw/char/riscv_htif.c hw/intc/riscv_aplic.c target/s390x/cpu.c Apply sysemu header path changes to not in the pull request. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2024-12-21 11:07:00 -05:00
Philippe Mathieu-Daudé	32cad1ffb8	include: Rename sysemu/ -> system/ Headers in include/sysemu/ are not only related to system emulation, they are also used by virtualization. Rename as system/ which is clearer. Files renamed manually then mechanical change using sed tool. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Lei Yang <leiyang@redhat.com> Message-Id: <20241203172445.28576-1-philmd@linaro.org>	2024-12-20 17:44:56 +01:00
Maciej S. Szmigiero	3f2a05b31e	target/i386: Reset TSCs of parked vCPUs too on VM reset Since commit `5286c36622` ("target/i386: properly reset TSC on reset") QEMU writes the special value of "1" to each online vCPU TSC on VM reset to reset it. However parked vCPUs don't get that handling and due to that their TSCs get desynchronized when the VM gets reset. This in turn causes KVM to turn off PVCLOCK_TSC_STABLE_BIT in its exported PV clock. Note that KVM has no understanding of vCPU being currently parked. Without PVCLOCK_TSC_STABLE_BIT the sched clock is marked unstable in the guest's kvm_sched_clock_init(). This causes a performance regressions to show in some tests. Fix this issue by writing the special value of "1" also to TSCs of parked vCPUs on VM reset. Reproducing the issue: 1) Boot a VM with "-smp 2,maxcpus=3" or similar 2) device_add host-x86_64-cpu,id=vcpu,node-id=0,socket-id=0,core-id=2,thread-id=0 3) Wait a few seconds 4) device_del vcpu 5) Inside the VM run: # echo "t" >/proc/sysrq-trigger; dmesg \| grep sched_clock_stable Observe the sched_clock_stable() value is 1. 6) Reboot the VM 7) Once the VM boots once again run inside it: # echo "t" >/proc/sysrq-trigger; dmesg \| grep sched_clock_stable Observe the sched_clock_stable() value is now 0. Fixes: `5286c36622` ("target/i386: properly reset TSC on reset") Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/r/5a605a88e9a231386dc803c60f5fed9b48108139.1734014926.git.maciej.szmigiero@oracle.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-12-19 19:36:38 +01:00
Paolo Bonzini	b8f8c10f85	kvm: consistently return 0/-errno from kvm_convert_memory In case of incorrect parameters, kvm_convert_memory() was returning -1 instead of -EINVAL. The guest won't notice because it will move anyway to RUN_STATE_INTERNAL_ERROR, but fix this for consistency and clarity. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-12-19 19:36:37 +01:00
Paolo Bonzini	586d708c1e	accel/kvm: check for KVM_CAP_MEMORY_ATTRIBUTES on vm The exact set of available memory attributes can vary by VM. In the future it might vary depending on enabled capabilities, too. Query the extension on the VM level instead of on the KVM level, and only after architecture-specific initialization. Inspired by an analogous patch by Tom Dohrmann. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-10-17 19:41:30 +02:00
Paolo Bonzini	60de433d4c	accel/kvm: check for KVM_CAP_MULTI_ADDRESS_SPACE on vm KVM_CAP_MULTI_ADDRESS_SPACE used to be a global capability, but with the introduction of AMD SEV-SNP confidential VMs, the number of address spaces can vary by VM type. Query the extension on the VM level instead of on the KVM level. Inspired by an analogous patch by Tom Dohrmann. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-10-17 19:41:30 +02:00
Tom Dohrmann	64e0e63ea1	accel/kvm: check for KVM_CAP_READONLY_MEM on VM KVM_CAP_READONLY_MEM used to be a global capability, but with the introduction of AMD SEV-SNP confidential VMs, this extension is not always available on all VM types [1,2]. Query the extension on the VM level instead of on the KVM level. [1] https://patchwork.kernel.org/project/kvm/patch/20240809190319.1710470-2-seanjc@google.com/ [2] https://patchwork.kernel.org/project/kvm/patch/20240902144219.3716974-1-erbse.13@gmx.de/ Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Tom Dohrmann <erbse.13@gmx.de> Link: https://lore.kernel.org/r/20240903062953.3926498-1-erbse.13@gmx.de Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-10-17 19:41:30 +02:00
Peter Xu	943c742868	KVM: Rename KVMState->nr_slots to nr_slots_max This value used to reflect the maximum supported memslots from KVM kernel. Rename it to be clearer. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240917163835.194664-5-peterx@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-10-17 19:41:30 +02:00
Peter Xu	dbdc00ba5b	KVM: Rename KVMMemoryListener.nr_used_slots to nr_slots_used This will make all nr_slots counters to be named in the same manner. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240917163835.194664-4-peterx@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-10-17 19:41:30 +02:00
Peter Xu	b34a908c8f	KVM: Define KVM_MEMSLOTS_NUM_MAX_DEFAULT Make the default max nr_slots a macro, it's only used when KVM reports nothing. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240917163835.194664-3-peterx@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-10-17 19:41:30 +02:00
Peter Xu	5504a81261	KVM: Dynamic sized kvm memslots array Zhiyi reported an infinite loop issue in VFIO use case. The cause of that was a separate discussion, however during that I found a regression of dirty sync slowness when profiling. Each KVMMemoryListerner maintains an array of kvm memslots. Currently it's statically allocated to be the max supported by the kernel. However after Linux commit 4fc096a99e ("KVM: Raise the maximum number of user memslots"), the max supported memslots reported now grows to some number large enough so that it may not be wise to always statically allocate with the max reported. What's worse, QEMU kvm code still walks all the allocated memslots entries to do any form of lookups. It can drastically slow down all memslot operations because each of such loop can run over 32K times on the new kernels. Fix this issue by making the memslots to be allocated dynamically. Here the initial size was set to 16 because it should cover the basic VM usages, so that the hope is the majority VM use case may not even need to grow at all (e.g. if one starts a VM with ./qemu-system-x86_64 by default it'll consume 9 memslots), however not too large to waste memory. There can also be even better way to address this, but so far this is the simplest and should be already better even than before we grow the max supported memslots. For example, in the case of above issue when VFIO was attached on a 32GB system, there are only ~10 memslots used. So it could be good enough as of now. In the above VFIO context, measurement shows that the precopy dirty sync shrinked from ~86ms to ~3ms after this patch applied. It should also apply to any KVM enabled VM even without VFIO. NOTE: we don't have a FIXES tag for this patch because there's no real commit that regressed this in QEMU. Such behavior existed for a long time, but only start to be a problem when the kernel reports very large nr_slots_max value. However that's pretty common now (the kernel change was merged in 2021) so we attached cc:stable because we'll want this change to be backported to stable branches. Cc: qemu-stable <qemu-stable@nongnu.org> Reported-by: Zhiyi Guo <zhguo@redhat.com> Tested-by: Zhiyi Guo <zhguo@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240917163835.194664-2-peterx@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-10-17 19:41:30 +02:00
Julia Suvorova	a1676bb304	kvm: Allow kvm_arch_get/put_registers to accept Error** This is necessary to provide discernible error messages to the caller. Signed-off-by: Julia Suvorova <jusual@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240927104743.218468-2-jusual@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-10-03 22:04:19 +02:00
Ani Sinha	28ed7f9761	accel/kvm: refactor dirty ring setup Refactor setting up of dirty ring code in kvm_init() so that is can be reused in the future patchsets. Signed-off-by: Ani Sinha <anisinha@redhat.com> Link: https://lore.kernel.org/r/20240912061838.4501-1-anisinha@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-10-03 19:33:55 +02:00

1 2 3 4 5 ...

260 commits