qemu-cr16

Author	SHA1	Message	Date
Xiaoyao Li	b5ff08e64e	i386/kvm: Drop KVM_CAP_X86_SMM check in kvm_arch_init() x86_machine_is_smm_enabled() checks the KVM_CAP_X86_SMM for KVM case. No need to check KVM_CAP_X86_SMM in kvm_arch_init(). So just drop the check of KVM_CAP_X86_SMM to simplify the code. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250729062014.1669578-3-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-09-17 19:01:55 +02:00
Xiaoyao Li	591f817d81	target/i386: Define enum X86ASIdx for x86's address spaces Define X86ASIdx as enum, like ARM's ARMASIdx, so that it's clear index 0 is for memory and index 1 is for SMM. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Tested-By: Kirill Martynov <stdcalllevi@yandex-team.ru> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250730095253.1833411-3-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-09-17 19:00:58 +02:00
Xiaoyao Li	0516f4b702	i386/cpu: Enable SMM cpu address space under KVM Kirill Martynov reported assertation in cpu_asidx_from_attrs() being hit when x86_cpu_dump_state() is called to dump the CPU state[]. It happens when the CPU is in SMM and KVM emulation failure due to misbehaving guest. The root cause is that QEMU i386 never enables the SMM address space for cpu since KVM SMM support has been added. Enable the SMM cpu address space under KVM when the SMM is enabled for the x86machine. [] https://lore.kernel.org/qemu-devel/20250523154431.506993-1-stdcalllevi@yandex-team.ru/ Reported-by: Kirill Martynov <stdcalllevi@yandex-team.ru> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Kirill Martynov <stdcalllevi@yandex-team.ru> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250730095253.1833411-2-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-09-17 19:00:58 +02:00
Paolo Bonzini	d5e33b5f8f	accel: make all calls to qemu_process_cpu_events look the same There is no reason for some accelerators to use qemu_process_cpu_events_common (which is separated from qemu_process_cpu_events() specifically for round robin TCG). They can also check for events directly on the first pass through the loop, instead of setting cpu->exit_request to true. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-09-17 19:00:56 +02:00
Paolo Bonzini	9a191d3782	cpus: clear exit_request in qemu_process_cpu_events Make the code common to all accelerators: after seeing cpu->exit_request set to true, accelerator code needs to reach qemu_process_cpu_events_common(). So for the common cases where they use qemu_process_cpu_events(), go ahead and clear it in there. Note that the cheap qatomic_set() is enough because at this point the thread has taken the BQL; qatomic_set_mb() is not needed. In particular, this is the ordering of the communication between I/O and vCPU threads is always the same. In the I/O thread: (a) store other memory locations that will be checked if cpu->exit_request or cpu->interrupt_request is 1 (for example cpu->stop or cpu->work_list for cpu->exit_request) (b) cpu_exit(): store-release cpu->exit_request, or (b) cpu_interrupt(): store-release cpu->interrupt_request >>> at this point, cpu->halt_cond is broadcast and the BQL released (c) do the accelerator-specific kick (e.g. write icount_decr for TCG, pthread_kill for KVM, etc.) In the vCPU thread instead the opposite order is respected: (c) the accelerator's execution loop exits thanks to the kick (b) then the inner execution loop checks cpu->interrupt_request and cpu->exit_request. If needed cpu->interrupt_request is converted into cpu->exit_request when work is needed outside the execution loop. (a) then the other memory locations are checked. Some may need to be read under the BQL, but the vCPU thread may also take other locks (e.g. for queued work items) or none at all. qatomic_set_mb() would only be needed if the halt sleep was done outside the BQL (though in that case, cpu->exit_request probably would be replaced by a QemuEvent or something like that). Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-09-17 19:00:56 +02:00
Paolo Bonzini	871de7078f	treewide: rename qemu_wait_io_event/qemu_wait_io_event_common Do so before extending it to the user-mode emulators, where there is no such thing as an "I/O thread". Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-09-17 19:00:55 +02:00
Paolo Bonzini	f8217ae54e	cpus: properly kick CPUs out of inner execution loop Now that cpu_exit() actually kicks all accelerators, use it whenever the message to another thread is processed in qemu_wait_io_event(). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-09-17 19:00:55 +02:00
Paolo Bonzini	f084ff128b	accel: use atomic accesses for exit_request CPU threads write exit_request as a "note to self" that they need to go out to a slow path. This write happens out of the BQL and can be a data race with another threads' cpu_exit(); use atomic accesses consistently. While at it, change the source argument from int ("1") to bool ("true"). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-09-17 19:00:55 +02:00
Paolo Bonzini	ac6c8a390b	accel: use store_release/load_acquire for cross-thread exit_request Reads and writes cpu->exit_request do not use a load-acquire/store-release pair right now, but this means that cpu_exit() may not write cpu->exit_request after any flags that are read by the vCPU thread. Probably everything is protected one way or the other by the BQL, because cpu->exit_request leads to the slow path, where the CPU thread often takes the BQL (for example, to go to sleep by waiting on the BQL-protected cpu->halt_cond); but it's not clear, so use load-acquire/store-release consistently. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-09-17 19:00:55 +02:00
Paolo Bonzini	602d5ebba2	treewide: clear bits of cs->interrupt_request with cpu_reset_interrupt() Open coding cpu_reset_interrupt() can cause bugs if the BQL is not taken, for example i386 has the call chain kvm_cpu_exec() -> kvm_put_vcpu_events() -> kvm_arch_put_registers(). Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-09-17 19:00:55 +02:00
Paolo Bonzini	3efe1a0f60	target/i386: limit a20 to system emulation It is not used by user-mode emulation and is the only caller of cpu_interrupt() in qemu-i386 and qemu-x86_64. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-09-17 19:00:55 +02:00
Markus Armbruster	b2e4534a2c	i386/kvm/vmsr_energy: Plug memory leak on failure to connect socket vmsr_open_socket() leaks the Error set by qio_channel_socket_connect_sync(). Plug the leak by not creating the Error. Fixes: `0418f90809` (Add support for RAPL MSRs in KVM/Qemu) Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20250723133257.1497640-2-armbru@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>	2025-09-01 13:10:55 +02:00
Igor Mammedov	17e645c6f1	kvm: i386: irqchip: take BQL only if there is an interrupt when kernel-irqchip=split is used, QEMU still hits BQL contention issue when reading ACPI PM/HPET timers (despite of timer[s] access being lock-less). So Windows with more than 255 cpus is still not able to boot (since it requires iommu -> split irqchip). Problematic path is in kvm_arch_pre_run() where BQL is taken unconditionally when split irqchip is in use. There are a few parts that BQL protects there: 1. interrupt check and injecting however we do not take BQL when checking for pending interrupt (even within the same function), so the patch takes the same approach for cpu->interrupt_request checks and takes BQL only if there is a job to do. 2. request_interrupt_window access CPUState::kvm_run::request_interrupt_window doesn't need BQL as it's accessed by its own vCPU thread. 3. cr8/cpu_get_apic_tpr access the same (as #2) applies to CPUState::kvm_run::cr8, and APIC registers are also cached/synced (get/put) within the vCPU thread it belongs to. Taking BQL only when is necessary, eleminates BQL bottleneck on IO/MMIO only exit path, improoving latency by 80% on HPET micro benchmark. This lets Windows to boot succesfully (in case hv-time isn't used) when more than 255 vCPUs are in use. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20250814160600.2327672-8-imammedo@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-08-29 12:48:14 +02:00
Igor Mammedov	87511341c3	add cpu_test_interrupt()/cpu_set_interrupt() helpers and use them tree wide The helpers form load-acquire/store-release pair and ensure that appropriate barriers are in place in case checks happen outside of BQL. Use them to replace open-coded checkers/setters across the code, to make sure that barriers are not missed. Helpers also make code a bit more readable. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Link: https://lore.kernel.org/r/20250821155603.2422553-1-imammedo@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-08-29 12:48:14 +02:00
Zero Tang	c12cbaa007	i386/tcg/svm: fix incorrect canonicalization For all 32-bit systems and 64-bit Windows systems, "long" is 4 bytes long. Due to using "long" for a linear address, svm_canonicalization would set all high bits to 1 when (assuming 48-bit linear address) the segment base is bigger than 0x7FFF. This fixes booting guests under TCG when the guest IDT and GDT bases are above 0x7FFF, thereby resulting in incorrect bases. When an interrupt arrives, it would trigger a #PF exception; the #PF would trigger again, resulting in a #DF exception; the #PF would trigger for the third time, resulting in triple-fault, and eventually causes a shutdown VM-Exit to the hypervisor right after guest boot. Cc: qemu-stable@nongnu.org Signed-off-by: Zero Tang <zero.tangptr@gmail.com>	2025-08-27 10:57:03 +02:00
Xin Wang	27535e9cca	target/i386: Add support for save/load of exception error code For now, qemu save/load CPU exception info(such as exception_nr and has_error_code), while the exception error_code is ignored. This will cause the dest hypervisor reinject a vCPU exception with error_code(0), potentially causing a guest kernel panic. For instance, if src VM stopped with an user-mode write #PF (error_code 6), the dest hypervisor will reinject an #PF with error_code(0) when vCPU resume, then guest kernel panic as: BUG: unable to handle page fault for address: 00007f80319cb010 #PF: supervisor read access in user mode #PF: error_code(0x0000) - not-present page RIP: 0033:0x40115d To fix it, support save/load exception error_code. Signed-off-by: Xin Wang <wangxinxin.wang@huawei.com> Link: https://lore.kernel.org/r/20250819145834.3998-1-wangxinxin.wang@huawei.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-08-20 22:47:43 +02:00
Paolo Bonzini	faaaf017d5	Revert "i386/cpu: Warn about why CPUID_EXT_PDCM is not available" This reverts commit `00268e0002`. (The only conflict is in the !is_tdx_vm() part of the condition, which is safe to keep). mark_unavailable_features() actively blocks usage of the feature, so it is a functional change, not merely a emitting warning. The commit was intended to merely warn if PDCM was enabled when the performance counters are not, so revert it. Reported-by: Christian A. Ehrhardt <christian.ehrhardt@canonical.com> Analyzed-by: Daniel P. Berrangé <berrange@redhat.com> Analyzed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-ID: <20250819150235.785559-1-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2025-08-19 14:05:56 -04:00
Zhao Liu	4e5d58969e	target/i386/cpu: Move addressable ID encoding out of compat property in CPUID[0x1] Currently, the addressable ID encoding for CPUID[0x1].EBX[bits 16-23] (Maximum number of addressable IDs for logical processors in this physical package) is covered by vendor_cpuid_only_v2 compat property. The previous consideration was to avoid breaking migration and this compat property makes it unfriendly to backport the commit `f985a1195b` ("i386/cpu: Fix number of addressable IDs field for CPUID.01H.EBX [23:16]"). However, NetBSD booting is broken since the commit `88dd4ca06c` ("i386/cpu: Use APIC ID info to encode cache topo in CPUID[4]"), because NetBSD calculates smt information via `lp_max` / `core_max` for legacy Intel CPUs which doesn't support 0xb leaf, where `lp_max` is from CPUID[0x1].EBX.bits[16-23] and `core_max` is from CPUID[0x4].0x0.bits[26 -31]. The commit `88dd4ca0` changed the encoding rule of `core_max` but didn't update `lp_max`, so that NetBSD would get the wrong smt information, which leads to the module loading failure. Luckily, the commit `f985a1195b` ("i386/cpu: Fix number of addressable IDs field for CPUID.01H.EBX[23:16]") updated the encoding rule for `lp_max` and accidentally fixed the NetBSD issue too. This also shows that using CPUID[0x1] and CPUID[0x4].0x0 to calculate HT/SMT information is a common practice to detect CPU topology on legacy Intel CPUs. Therefore, it's necessary to backport the commit `f985a1195b` to previous stable QEMU to help address the similar issues as well. Then the compat property is not needed any more since all stable QEMUs will follow the same encoding way. So, in CPUID[0x1], move addressable ID encoding out of compat property. Reported-by: Michael Tokarev <mjt@tls.msk.ru> Inspired-by: Chuang Xu <xuchuangxclwt@bytedance.com> Fixes: commit `f985a1195b` ("i386/cpu: Fix number of addressable IDs field for CPUID.01H.EBX[23:16]") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3061 Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Tested-by: Michael Tokarev <mjt@tls.msk.ru> Message-ID: <20250804053548.1808629-1-zhao1.liu@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2025-08-05 17:30:29 +02:00
Paolo Bonzini	feea87cd6b	target/i386: fix width of third operand of VINSERTx128 Table A-5 of the Intel manual incorrectly lists the third operand of VINSERTx128 as Wqq, but it is actually a 128-bit value. This is visible when W is a memory operand close to the end of the page. Fixes the recently-added poly1305_kunit test in linux-next. (No testcase yet, but I plan to modify test-avx2 to use memory close to the end of the page. This would work because the test vectors correctly have the memory operand as xmm2/m128). Reported-by: Eric Biggers <ebiggers@kernel.org> Tested-by: Eric Biggers <ebiggers@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: qemu-stable@nongnu.org Fixes: `7906847768` ("target/i386: reimplement 0x0f 0x3a, add AVX", 2022-10-18) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-25 14:51:11 +02:00
Xiaoyao Li	f64832033d	i386/tdx: Remove the redundant qemu_mutex_init(&tdx->lock) Commit `40da501d89` ("i386/tdx: handle TDG.VP.VMCALL<GetQuote>") added redundant qemu_mutex_init(&tdx->lock) in tdx_guest_init by mistake. Fix it by removing the redundant one. Fixes: `40da501d89` ("i386/tdx: handle TDG.VP.VMCALL<GetQuote>") Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Link: https://lore.kernel.org/r/20250717103707.688929-1-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-17 17:18:59 +02:00
Xiaoyao Li	5fe6b9a854	i386/cpu: Cleanup host_cpu_max_instance_init() The implementation of host_cpu_max_instance_init() was merged into host_cpu_instance_init() by commit `29f1ba338b` ("target/i386: merge host_cpu_instance_init() and host_cpu_max_instance_init()"), while the declaration of it remains in host-cpu.h. Clean it up. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250716063117.602050-1-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-17 17:18:59 +02:00
Paolo Bonzini	f2b7879763	target/i386: tdx: fix locking for interrupt injection Take tdx_guest->lock when injecting the event notification interrupt into the guest. Fixes CID 1612364. Reported-by: Peter Maydell <peter.maydell@linaro.org> Cc: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-17 17:18:59 +02:00
Zhao Liu	e52af92e9e	i386/cpu: Move x86_ext_save_areas[] initialization to .instance_init In x86_cpu_post_initfn(), the initialization of x86_ext_save_areas[] marks the unsupported xsave areas based on Host support. This step must be done before accel_cpu_instance_init(), otherwise, KVM's assertion on host xsave support would fail: qemu-system-x86_64: ../target/i386/kvm/kvm-cpu.c:149: kvm_cpu_xsave_init: Assertion `esa->size == eax' failed. (on AMD EPYC 7302 16-Core Processor) Move x86_ext_save_areas[] initialization to .instance_init and place it before accel_cpu_instance_init(). Fixes: commit `5f158abef4` ("target/i386: move accel_cpu_instance_init to .instance_init") Reported-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250717023933.2502109-1-zhao1.liu@intel.com Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-17 15:51:29 +02:00
Paolo Bonzini	d3a24134e3	target/i386: do not expose ARCH_CAPABILITIES on AMD CPU KVM emulates the ARCH_CAPABILITIES on x86 for both Intel and AMD cpus, although the IA32_ARCH_CAPABILITIES MSR is an Intel-specific MSR and it makes no sense to emulate it on AMD. As a consequence, VMs created on AMD with qemu -cpu host and using KVM will advertise the ARCH_CAPABILITIES feature and provide the IA32_ARCH_CAPABILITIES MSR. This can cause issues (like Windows BSOD) as the guest OS might not expect this MSR to exist on such cpus (the AMD documentation specifies that ARCH_CAPABILITIES feature and MSR are not defined on the AMD architecture). A fix was proposed in KVM code, however KVM maintainers don't want to change this behavior that exists for 6+ years and suggest changes to be done in QEMU instead. Therefore, hide the bit from "-cpu host": migration of -cpu host guests is only possible between identical host kernel and QEMU versions, therefore this is not a problematic breakage. If a future AMD machine does include the MSR, that would re-expose the Windows guest bug; but it would not be KVM/QEMU's problem at that point, as we'd be following a genuine physical CPU impl. Reported-by: Alexandre Chartre <alexandre.chartre@oracle.com> Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-17 15:50:45 +02:00
Stefan Hajnoczi	f96b157ebb	Accelerators patches - Unify x86/arm hw/xen/arch_hvm.h header - Move non-system-specific 'accel/accel-ops.h' and 'accel-cpu-ops.h' to accel/ - Move KVM definitions qapi/accelerator.json - Add @qom-type field to CpuInfoFast QAPI structure - Display CPU model name in 'info cpus' HMP command - Introduce @x-accel-stats QMP command - Add 'info accel' on HMP - Improve qemu_add_vm_change_state_handler() docstring - Extract TCG statistic related code to tcg-stats.c - Implement AccelClass::get_[vcpu]_stats() handlers for TCG and HVF - Do not dump NaN in TCG statistics - Revert incomplete "accel/tcg: Unregister the RCU before exiting RR thread" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmh2r4UACgkQ4+MsLN6t wN5i6xAAkOvwFh1GmsPUdz5RxzsWoIUDvyENg6E8Axwe5tSEMRFiPjabbTQJomQg GZt75XIS24LZFZ+hvqrLSA+dFgXTgWv08ZE81EjwjmAMBlLCOPhCgeN6C1p8100Y scSvRJbP9k9lpA5K7et/1X4AkK2cZyh+LGJgCjr2Al2mbERpPueDF8fxqeohFvXQ nTSks4XlA0yQ06+9r49aQAiuXvgg9lDT1wIglD2HEV7vOVs/ud+yyL8+z5YMeFzx pSIc6wDu4PqdA46w4MZs90uTy7S/PMvBiYDEiV3tKzg0MLttvFGlT58/YjVtguTP mNkfwIEwQtDQzoxsFIJO7yBTlTRBs95V4aIVk3pB+Gb/bideRPIkeVQvgMSEBKj7 N0pEXWOxfB9iIWO6b1utYpQ4uxeDOU/8DPUCit1IBbNgKTaJkJb77fboYk7NaB0K KEtObAk6jMatB/xr+vUFWc4sMk9wlm72w8wcQzgKZ0xV2U3d1/Y/9nS4GvI510ev TRQ3mKj7N319uCeId1czF6W8rillCJ2u8ZK53u+Nfp7R3PbsRSMc6IDJ1UdDUlyR HFcWHxbcbEGhe8SnFGab4Qd6fWChcn2EaEoAJJz+Rqv0k3zcwqccNM5waCABAjTE 0S22JIHePJKcpkMLGq3EOUAQuu+8Zsol7gPCLxSAMclVqPTl9ck= =rAav -----END PGP SIGNATURE----- Merge tag 'accel-20250715' of https://github.com/philmd/qemu into staging Accelerators patches - Unify x86/arm hw/xen/arch_hvm.h header - Move non-system-specific 'accel/accel-ops.h' and 'accel-cpu-ops.h' to accel/ - Move KVM definitions qapi/accelerator.json - Add @qom-type field to CpuInfoFast QAPI structure - Display CPU model name in 'info cpus' HMP command - Introduce @x-accel-stats QMP command - Add 'info accel' on HMP - Improve qemu_add_vm_change_state_handler() docstring - Extract TCG statistic related code to tcg-stats.c - Implement AccelClass::get_[vcpu]_stats() handlers for TCG and HVF - Do not dump NaN in TCG statistics - Revert incomplete "accel/tcg: Unregister the RCU before exiting RR thread" # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmh2r4UACgkQ4+MsLN6t # wN5i6xAAkOvwFh1GmsPUdz5RxzsWoIUDvyENg6E8Axwe5tSEMRFiPjabbTQJomQg # GZt75XIS24LZFZ+hvqrLSA+dFgXTgWv08ZE81EjwjmAMBlLCOPhCgeN6C1p8100Y # scSvRJbP9k9lpA5K7et/1X4AkK2cZyh+LGJgCjr2Al2mbERpPueDF8fxqeohFvXQ # nTSks4XlA0yQ06+9r49aQAiuXvgg9lDT1wIglD2HEV7vOVs/ud+yyL8+z5YMeFzx # pSIc6wDu4PqdA46w4MZs90uTy7S/PMvBiYDEiV3tKzg0MLttvFGlT58/YjVtguTP # mNkfwIEwQtDQzoxsFIJO7yBTlTRBs95V4aIVk3pB+Gb/bideRPIkeVQvgMSEBKj7 # N0pEXWOxfB9iIWO6b1utYpQ4uxeDOU/8DPUCit1IBbNgKTaJkJb77fboYk7NaB0K # KEtObAk6jMatB/xr+vUFWc4sMk9wlm72w8wcQzgKZ0xV2U3d1/Y/9nS4GvI510ev # TRQ3mKj7N319uCeId1czF6W8rillCJ2u8ZK53u+Nfp7R3PbsRSMc6IDJ1UdDUlyR # HFcWHxbcbEGhe8SnFGab4Qd6fWChcn2EaEoAJJz+Rqv0k3zcwqccNM5waCABAjTE # 0S22JIHePJKcpkMLGq3EOUAQuu+8Zsol7gPCLxSAMclVqPTl9ck= # =rAav # -----END PGP SIGNATURE----- # gpg: Signature made Tue 15 Jul 2025 15:44:05 EDT # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full] # Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE * tag 'accel-20250715' of https://github.com/philmd/qemu: system/runstate: Document qemu_add_vm_change_state_handler_prio* in hdr system/runstate: Document qemu_add_vm_change_state_handler() accel/hvf: Implement AccelClass::get_vcpu_stats() handler accel/tcg: Implement AccelClass::get_stats() handler accel/tcg: Propagate AccelState to dump_accel_info() accel/system: Add 'info accel' on human monitor accel/system: Introduce @x-accel-stats QMP command accel/tcg: Extract statistic related code to tcg-stats.c Revert "accel/tcg: Unregister the RCU before exiting RR thread" accel: Extract AccelClass definition to 'accel/accel-ops.h' accel: Rename 'system/accel-ops.h' -> 'accel/accel-cpu-ops.h' accel/tcg: Do not dump NaN statistics hw/core/machine: Display CPU model name in 'info cpus' command qapi/machine: Add @qom-type field to CpuInfoFast structure qapi/accel: Move definitions related to accelerators in their own file hw/arm/xen-pvh: Remove unnecessary 'hw/xen/arch_hvm.h' header hw/xen/arch_hvm: Unify x86 and ARM variants Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Conflicts: qapi/machine.json Commit `0462da9d6b` ("qapi: remove trivial "Returns:" sections") removed trivial "Returns:". This caused a conflict with the move from machine.json to accelerator.json.	2025-07-16 07:13:40 -04:00
Stefan Hajnoczi	e452053097	virtio,pci,pc: features, fixes, tests SPCR acpi table can now be disabled vhost-vdpa can now report hashing capability to guest PPTT acpi table now tells guest vCPUs are identical vost-user-blk now shuts down faster loongarch64 now supports bios-tables-test intel_iommu now supports ATS cxl now supports DCD Fabric Management Command Set arm now supports acpi pci hotplug fixes, cleanups Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmh1+7APHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpcZ8H/2udpCZ49vjPB8IwQAGdFTw2TWVdxUQFHexQ pOsCGyFBNAXqD1bmb8lwWyYVJ08WELyL6xWsQ5tfVPiXpKYYHPHl4rNr/SPoyNcv joY++tagudmOki2DU7nfJ+rPIIuigOTUHbv4TZciwcHle6f65s0iKXhR1sL0cj4i TS6iJlApSuJInrBBUxuxSUomXk79mFTNKRiXj1k58LRw6JOUEgYvtIW8i+mOUcTg h1dZphxEQr/oG+a2pM8GOVJ1AFaBPSfgEnRM4kTX9QuTIDCeMAKUBo/mwOk6PV7z ZhSrDPLrea27XKGL++EJm0fFJ/AsHF1dTks2+c0rDrSK+UV87Zc= =sktm -----END PGP SIGNATURE----- Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging virtio,pci,pc: features, fixes, tests SPCR acpi table can now be disabled vhost-vdpa can now report hashing capability to guest PPTT acpi table now tells guest vCPUs are identical vost-user-blk now shuts down faster loongarch64 now supports bios-tables-test intel_iommu now supports ATS cxl now supports DCD Fabric Management Command Set arm now supports acpi pci hotplug fixes, cleanups Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmh1+7APHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpcZ8H/2udpCZ49vjPB8IwQAGdFTw2TWVdxUQFHexQ # pOsCGyFBNAXqD1bmb8lwWyYVJ08WELyL6xWsQ5tfVPiXpKYYHPHl4rNr/SPoyNcv # joY++tagudmOki2DU7nfJ+rPIIuigOTUHbv4TZciwcHle6f65s0iKXhR1sL0cj4i # TS6iJlApSuJInrBBUxuxSUomXk79mFTNKRiXj1k58LRw6JOUEgYvtIW8i+mOUcTg # h1dZphxEQr/oG+a2pM8GOVJ1AFaBPSfgEnRM4kTX9QuTIDCeMAKUBo/mwOk6PV7z # ZhSrDPLrea27XKGL++EJm0fFJ/AsHF1dTks2+c0rDrSK+UV87Zc= # =sktm # -----END PGP SIGNATURE----- # gpg: Signature made Tue 15 Jul 2025 02:56:48 EDT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (97 commits) hw/cxl: mailbox-utils: 0x5605 - FMAPI Initiate DC Release hw/cxl: mailbox-utils: 0x5604 - FMAPI Initiate DC Add hw/cxl: Create helper function to create DC Event Records from extents hw/cxl: mailbox-utils: 0x5603 - FMAPI Get DC Region Extent Lists hw/cxl: mailbox-utils: 0x5602 - FMAPI Set DC Region Config hw/mem: cxl_type3: Add DC Region bitmap lock hw/cxl: Move definition for dynamic_capacity_uuid and enum for DC event types to header hw/cxl: mailbox-utils: 0x5601 - FMAPI Get Host Region Config hw/mem: cxl_type3: Add dsmas_flags to CXLDCRegion struct hw/cxl: mailbox-utils: 0x5600 - FMAPI Get DCD Info hw/cxl: fix DC extent capacity tracking tests: virt: Update expected ACPI tables for virt test hw/acpi/aml-build: Build a root node in the PPTT table hw/acpi/aml-build: Set identical implementation flag for PPTT processor nodes tests: virt: Allow changes to PPTT test table qtest/bios-tables-test: Generate reference blob for DSDT.acpipcihp qtest/bios-tables-test: Generate reference blob for DSDT.hpoffacpiindex tests/qtest/bios-tables-test: Add aarch64 ACPI PCI hotplug test tests/qtest/bios-tables-test: Prepare for addition of acpi pci hp tests hw/arm/virt: Let virt support pci hotplug/unplug GED event ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Conflicts: net/vhost-vdpa.c vhost_vdpa_set_steering_ebpf() was removed, resolve the context conflict.	2025-07-16 07:00:47 -04:00
Philippe Mathieu-Daudé	f7a7e7dd21	accel: Extract AccelClass definition to 'accel/accel-ops.h' Only accelerator implementations (and the common accelator code) need to know about AccelClass internals. Move the definition out but forward declare AccelState and AccelClass. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250703173248.44995-39-philmd@linaro.org>	2025-07-15 19:34:33 +02:00
Philippe Mathieu-Daudé	05927e9dc9	accel: Rename 'system/accel-ops.h' -> 'accel/accel-cpu-ops.h' Unfortunately "system/accel-ops.h" handlers are not only system-specific. For example, the cpu_reset_hold() hook is part of the vCPU creation, after it is realized. Mechanical rename to drop 'system' using: $ sed -i -e s_system/accel-ops.h_accel/accel-cpu-ops.h_g \ $(git grep -l system/accel-ops.h) Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250703173248.44995-38-philmd@linaro.org>	2025-07-15 19:34:33 +02:00
Philippe Mathieu-Daudé	0f64fb6743	qemu: Declare all load/store helper in 'qemu/bswap.h' Restrict "exec/tswap.h" to the tswap*() methods, move the load/store helpers with the other ones declared in "qemu/bswap.h". Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-Id: <20250708215320.70426-8-philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2025-07-15 02:56:39 -04:00
Zhao Liu	5d21ee453a	i386/cpu: Honor maximum value for CPUID.8000001DH.EAX[25:14] CPUID.8000001DH:EAX[25:14] is "NumSharingCache", and the number of logical processors sharing this cache is the value of this field incremented by 1. Because of its width limitation, the maximum value currently supported is 4095. Though at present Q35 supports up to 4096 CPUs, by constructing a specific topology, the width of the APIC ID can be extended beyond 12 bits. For example, using `-smp threads=33,cores=9,modules=9` results in a die level offset of 6 + 4 + 4 = 14 bits, which can also cause overflow. Check and honor the maximum value as CPUID.04H did. Cc: Babu Moger <babu.moger@amd.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250714080859.1960104-8-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-14 10:29:17 +02:00
Qian Wen	3e86124e7c	i386/cpu: Fix overflow of cache topology fields in CPUID.04H According to SDM, CPUID.0x4:EAX[31:26] indicates the Maximum number of addressable IDs for processor cores in the physical package. If we launch over 64 cores VM, the 6-bit field will overflow, and the wrong core_id number will be reported. Since the HW reports 0x3f when the intel processor has over 64 cores, limit the max value written to EAX[31:26] to 63, so max num_cores should be 64. For EAX[14:25], though at present Q35 supports up to 4096 CPUs, by constructing a specific topology, the width of the APIC ID can be extended beyond 12 bits. For example, using `-smp threads=33,cores=9, modules=9` results in a die level offset of 6 + 4 + 4 = 14 bits, which can also cause overflow. check and honor the maximum value for EAX[14:25] as well. In addition, for host-cache-info case, also apply the same checks and fixes. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Qian Wen <qian.wen@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250714080859.1960104-7-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-14 10:29:17 +02:00
Qian Wen	a62fef5829	i386/cpu: Fix cpu number overflow in CPUID.01H.EBX[23:16] The legacy topology enumerated by CPUID.1.EBX[23:16] is defined in SDM Vol2: Bits 23-16: Maximum number of addressable IDs for logical processors in this physical package. When threads_per_socket > 255, it will 1) overwrite bits[31:24] which is apic_id, 2) bits [23:16] get truncated. Specifically, if launching the VM with -smp 256, the value written to EBX[23:16] is 0 because of data overflow. If the guest only supports legacy topology, without V2 Extended Topology enumerated by CPUID.0x1f or Extended Topology enumerated by CPUID.0x0b to support over 255 CPUs, the return of the kernel invoking cpu_smt_allowed() is false and APs (application processors) will fail to bring up. Then only CPU 0 is online, and others are offline. For example, launch VM via: qemu-system-x86_64 -M q35,accel=kvm,kernel-irqchip=split \ -cpu qemu64,cpuid-0xb=off -smp 256 -m 32G \ -drive file=guest.img,if=none,id=virtio-disk0,format=raw \ -device virtio-blk-pci,drive=virtio-disk0,bootindex=1 --nographic The guest shows: CPU(s): 256 On-line CPU(s) list: 0 Off-line CPU(s) list: 1-255 To avoid this issue caused by overflow, limit the max value written to EBX[23:16] to 255 as the HW does. Cc: qemu-stable@nongnu.org Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Qian Wen <qian.wen@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250714080859.1960104-6-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-14 10:29:17 +02:00
Chuang Xu	f985a1195b	i386/cpu: Fix number of addressable IDs field for CPUID.01H.EBX[23:16] When QEMU is started with: -cpu host,migratable=on,host-cache-info=on,l3-cache=off -smp 180,sockets=2,dies=1,cores=45,threads=2 On Intel platform: CPUID.01H.EBX[23:16] is defined as "max number of addressable IDs for logical processors in the physical package". When executing "cpuid -1 -l 1 -r" in the guest, we obtain a value of 90 for CPUID.01H.EBX[23:16], whereas the expected value is 128. Additionally, executing "cpuid -1 -l 4 -r" in the guest yields a value of 63 for CPUID.04H.EAX[31:26], which matches the expected result. As (1+CPUID.04H.EAX[31:26]) rounds up to the nearest power-of-2 integer, it's necessary to round up CPUID.01H.EBX[23:16] to the nearest power-of-2 integer too. Otherwise there would be unexpected results in guest with older kernel. For example, when QEMU is started with CLI above and xtopology is disabled, guest kernel 5.15.120 uses CPUID.01H.EBX[23:16]/(1+CPUID.04H.EAX[31:26]) to calculate threads-per-core in detect_ht(). Then guest will get "90/(1+63)=1" as the result, even though threads-per-core should actually be 2. And on AMD platform: CPUID.01H.EBX[23:16] is defined as "Logical processor count". Current result meets our expectation. So round up CPUID.01H.EBX[23:16] to the nearest power-of-2 integer only for Intel platform to solve the unexpected result. Use the "x-vendor-cpuid-only-v2" compat option to fix this issue. Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Guixiong Wei <weiguixiong@bytedance.com> Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com> Signed-off-by: Chuang Xu <xuchuangxclwt@bytedance.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250714080859.1960104-5-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-14 10:29:17 +02:00
Zhao Liu	075e91a4a4	i386/cpu: Reorder CPUID leaves in cpu_x86_cpuid() Sort the CPUID leaves strictly by index to facilitate checking and changing. Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Tao Su <tao1.su@linux.intel.com> Link: https://lore.kernel.org/r/20250627035129.2755537-5-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-14 10:29:17 +02:00
Zhao Liu	da84c01154	i386/cpu: Mark CPUID 0x80000008 ECX bits[0:7] & [12:15] as reserved for Intel/Zhaoxin Per SDM, 80000008H EAX Linear/Physical Address size. Bits 07-00: #Physical Address Bits*. Bits 15-08: #Linear Address Bits. Bits 31-16: Reserved = 0. EBX Bits 08-00: Reserved = 0. Bit 09: WBNOINVD is available if 1. Bits 31-10: Reserved = 0. ECX Reserved = 0. EDX Reserved = 0. ECX/EDX in CPUID 0x80000008 leaf are reserved. Currently, in QEMU, only ECX bits[0:7] and ECX bits[12:15] are encoded, and both are emulated in QEMU. Considering that Intel and Zhaoxin are already using the 0x1f leaf to describe CPU topology, which includes similar information, Intel and Zhaoxin will not implement ECX bits[0:7] and bits[12:15] of 0x80000008. Therefore, mark these two fields as reserved and clear them for Intel and Zhaoxin guests. Reviewed-by: Tao Su <tao1.su@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250714080859.1960104-3-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-14 10:29:12 +02:00
Zhao Liu	1c52c470ba	i386/cpu: Mark CPUID 0x80000007[EBX] as reserved for Intel Per SDM, 80000007H EAX Reserved = 0. EBX Reserved = 0. ECX Reserved = 0. EDX Bits 07-00: Reserved = 0. Bit 08: Invariant TSC available if 1. Bits 31-09: Reserved = 0. EAX/EBX/ECX in CPUID 0x80000007 leaf are reserved for Intel. At present, EAX is reserved for AMD, too. And AMD hasn't used ECX in QEMU. So these 2 registers are both left as 0. Therefore, only fix the EBX and excode it as 0 for Intel. Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Tao Su <tao1.su@linux.intel.com> Link: https://lore.kernel.org/r/20250627035129.2755537-3-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-14 10:27:07 +02:00
Zhao Liu	a539cd2614	i386/cpu: Mark EBX/ECX/EDX in CPUID 0x80000000 leaf as reserved for Intel Per SDM, 80000000H EAX Maximum Input Value for Extended Function CPUID Information. EBX Reserved. ECX Reserved. EDX Reserved. EBX/ECX/EDX in CPUID 0x80000000 leaf are reserved. Intel is using 0x0 leaf to encode vendor. Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Tao Su <tao1.su@linux.intel.com> Link: https://lore.kernel.org/r/20250627035129.2755537-2-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-14 10:27:07 +02:00
Zhao Liu	8113b7f0e6	i386/cpu: Enable 0x1f leaf for YongFeng by default Host YongFeng CPU has 0x1f leaf by default, so that enable it for Guest CPU by default as well. Suggested-by: Ewan Hai <ewanhai-oc@zhaoxin.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711104603.1634832-10-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:22 +02:00
Zhao Liu	1bf465f3e0	i386/cpu: Enable 0x1f leaf for SapphireRapids by default Host SapphireRapids CPU has 0x1f leaf by default, so that enable it for Guest CPU by default as well. Suggested-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711104603.1634832-9-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:22 +02:00
Zhao Liu	5dbdcdce06	i386/cpu: Enable 0x1f leaf for GraniteRapids by default Host GraniteRapids CPU has 0x1f leaf by default, so that enable it for Guest CPU by default as well. Suggested-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711104603.1634832-8-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:22 +02:00
Zhao Liu	af62bd3db0	i386/cpu: Enable 0x1f leaf for SierraForest by default Host SierraForest CPU has 0x1f leaf by default, so that enable it for Guest CPU by default as well. Suggested-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711104603.1634832-7-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:22 +02:00
Zhao Liu	e468c5e444	i386/cpu: Enable 0x1f leaf for SierraForest by default Host SierraForest CPU has 0x1f leaf by default, so that enable it for Guest CPU by default as well. Suggested-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711104603.1634832-7-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:22 +02:00
Manish Mishra	bf40325614	i386/cpu: Add a "x-force-cpuid-0x1f" property Add a "x-force-cpuid-0x1f" property so that CPU models can enable it and have 0x1f CPUID leaf natually as the Host CPU. The advantage is that when the CPU model's cache model is already consistent with the Host CPU, for example, SRF defaults to l2 per module & l3 per package, 0x1f can better help users identify the topology in the VM. Adding 0x1f for specific CPU models should not cause any trouble in principle. This property is only enabled for CPU models that already have 0x1f leaf on the Host, so software that originally runs normally on the Host won't encounter issues in the Guest with corresponding CPU model. Conversely, some software that relies on checking 0x1f might have problems in the Guest due to the lack of 0x1f []. In summary, adding 0x1f is also intended to further emulate the Host CPU environment. []: https://lore.kernel.org/qemu-devel/PH0PR02MB738410511BF51B12DB09BE6CF6AC2@PH0PR02MB7384.namprd02.prod.outlook.com/ Signed-off-by: Manish Mishra <manish.mishra@nutanix.com> Co-authored-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> [Integrated and rebased 2 previous patches (ordered by post time)] Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711104603.1634832-6-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:22 +02:00
Ewan Hai	b1a3a090b2	i386/cpu: Introduce cache model for YongFeng Add the cache model to YongFeng (v3) to better emulate its environment. Note, although YongFeng v2 was added after v10.0, it was also back ported to v10.0.2. Therefore, the new version (v3) is needed to avoid conflict. The cache model is as follows: --- cache 0 --- cache type = data cache (1) cache level = 0x1 (1) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x0 (0) maximum IDs for cores in pkg = 0x0 (0) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x8 (8) number of sets = 0x40 (64) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = false number of sets (s) = 64 (size synth) = 32768 (32 KB) --- cache 1 --- cache type = instruction cache (2) cache level = 0x1 (1) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x0 (0) maximum IDs for cores in pkg = 0x0 (0) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x10 (16) number of sets = 0x40 (64) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = false number of sets (s) = 64 (size synth) = 65536 (64 KB) --- cache 2 --- cache type = unified cache (3) cache level = 0x2 (2) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x0 (0) maximum IDs for cores in pkg = 0x0 (0) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x8 (8) number of sets = 0x200 (512) WBINVD/INVD acts on lower caches = false inclusive to lower caches = true complex cache indexing = false number of sets (s) = 512 (size synth) = 262144 (256 KB) --- cache 3 --- cache type = unified cache (3) cache level = 0x3 (3) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x0 (0) maximum IDs for cores in pkg = 0x0 (0) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x10 (16) number of sets = 0x2000 (8192) WBINVD/INVD acts on lower caches = true inclusive to lower caches = true complex cache indexing = false number of sets (s) = 8192 (size synth) = 8388608 (8 MB) --- cache 4 --- cache type = no more caches (0) Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Ewan Hai <ewanhai-oc@zhaoxin.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711104603.1634832-5-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:22 +02:00
Zhao Liu	8d69fc2158	i386/cpu: Introduce cache model for SapphireRapids Add the cache model to SapphireRapids (v4) to better emulate its environment. The cache model is based on SapphireRapids-SP (Scalable Performance): --- cache 0 --- cache type = data cache (1) cache level = 0x1 (1) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x1 (1) maximum IDs for cores in pkg = 0x3f (63) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0xc (12) number of sets = 0x40 (64) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = false number of sets (s) = 64 (size synth) = 49152 (48 KB) --- cache 1 --- cache type = instruction cache (2) cache level = 0x1 (1) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x1 (1) maximum IDs for cores in pkg = 0x3f (63) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x8 (8) number of sets = 0x40 (64) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = false number of sets (s) = 64 (size synth) = 32768 (32 KB) --- cache 2 --- cache type = unified cache (3) cache level = 0x2 (2) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x1 (1) maximum IDs for cores in pkg = 0x3f (63) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x10 (16) number of sets = 0x800 (2048) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = false number of sets (s) = 2048 (size synth) = 2097152 (2 MB) --- cache 3 --- cache type = unified cache (3) cache level = 0x3 (3) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x7f (127) maximum IDs for cores in pkg = 0x3f (63) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0xf (15) number of sets = 0x10000 (65536) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = true number of sets (s) = 65536 (size synth) = 62914560 (60 MB) --- cache 4 --- cache type = no more caches (0) Suggested-by: Tejus GK <tejus.gk@nutanix.com> Suggested-by: Jason Zeng <jason.zeng@intel.com> Suggested-by: "Daniel P . Berrangé" <berrange@redhat.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Reviewed-by: Tao Su <tao1.su@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711104603.1634832-4-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:22 +02:00
Zhao Liu	23fb57838d	i386/cpu: Introduce cache model for GraniteRapids Add the cache model to GraniteRapids (v3) to better emulate its environment. The cache model is based on GraniteRapids-SP (Scalable Performance): --- cache 0 --- cache type = data cache (1) cache level = 0x1 (1) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x1 (1) maximum IDs for cores in pkg = 0x3f (63) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0xc (12) number of sets = 0x40 (64) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = false number of sets (s) = 64 (size synth) = 49152 (48 KB) --- cache 1 --- cache type = instruction cache (2) cache level = 0x1 (1) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x1 (1) maximum IDs for cores in pkg = 0x3f (63) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x10 (16) number of sets = 0x40 (64) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = false number of sets (s) = 64 (size synth) = 65536 (64 KB) --- cache 2 --- cache type = unified cache (3) cache level = 0x2 (2) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x1 (1) maximum IDs for cores in pkg = 0x3f (63) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x10 (16) number of sets = 0x800 (2048) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = false number of sets (s) = 2048 (size synth) = 2097152 (2 MB) --- cache 3 --- cache type = unified cache (3) cache level = 0x3 (3) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0xff (255) maximum IDs for cores in pkg = 0x3f (63) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x10 (16) number of sets = 0x48000 (294912) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = true number of sets (s) = 294912 (size synth) = 301989888 (288 MB) --- cache 4 --- cache type = no more caches (0) Suggested-by: Tejus GK <tejus.gk@nutanix.com> Suggested-by: Jason Zeng <jason.zeng@intel.com> Suggested-by: "Daniel P . Berrangé" <berrange@redhat.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Reviewed-by: Tao Su <tao1.su@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711104603.1634832-3-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:22 +02:00
Zhao Liu	1f0a9ce6c0	i386/cpu: Introduce cache model for SierraForest Add the cache model to SierraForest (v3) to better emulate its environment. The cache model is based on SierraForest-SP (Scalable Performance): --- cache 0 --- cache type = data cache (1) cache level = 0x1 (1) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x0 (0) maximum IDs for cores in pkg = 0x3f (63) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x8 (8) number of sets = 0x40 (64) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = false number of sets (s) = 64 (size synth) = 32768 (32 KB) --- cache 1 --- cache type = instruction cache (2) cache level = 0x1 (1) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x0 (0) maximum IDs for cores in pkg = 0x3f (63) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x8 (8) number of sets = 0x80 (128) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = false number of sets (s) = 128 (size synth) = 65536 (64 KB) --- cache 2 --- cache type = unified cache (3) cache level = 0x2 (2) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x7 (7) maximum IDs for cores in pkg = 0x3f (63) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x10 (16) number of sets = 0x1000 (4096) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = false number of sets (s) = 4096 (size synth) = 4194304 (4 MB) --- cache 3 --- cache type = unified cache (3) cache level = 0x3 (3) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x1ff (511) maximum IDs for cores in pkg = 0x3f (63) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0xc (12) number of sets = 0x24000 (147456) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = true number of sets (s) = 147456 (size synth) = 113246208 (108 MB) --- cache 4 --- cache type = no more caches (0) Suggested-by: Tejus GK <tejus.gk@nutanix.com> Suggested-by: Jason Zeng <jason.zeng@intel.com> Suggested-by: "Daniel P . Berrangé" <berrange@redhat.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Reviewed-by: Tao Su <tao1.su@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711104603.1634832-2-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:22 +02:00
Zhao Liu	2141898bb9	i386/cpu: Use a unified cache_info in X86CPUState At present, all cases using the cache model (CPUID 0x2, 0x4, 0x80000005, 0x80000006 and 0x8000001D leaves) have been verified to be able to select either cache_info_intel or cache_info_amd based on the vendor. Therefore, further merge cache_info_intel and cache_info_amd into a unified cache_info in X86CPUState, and during its initialization, set different legacy cache models based on the vendor. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711102143.1622339-19-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:22 +02:00
Zhao Liu	25acae4c6e	i386/cpu: Select legacy cache model based on vendor in CPUID 0x8000001D As preparation for merging cache_info_cpuid4 and cache_info_amd in X86CPUState, set legacy cache model based on vendor in the CPUID 0x8000001D leaf. For AMD CPU, select legacy AMD cache model (in cache_info_amd) as the default cache model like before, otherwise, select legacy Intel cache model (in cache_info_cpuid4). In fact, for Intel (and Zhaoxin) CPU, this change is safe because the extended CPUID level supported by Intel is up to 0x80000008. So Intel Guest doesn't have this 0x8000001D leaf. Although someone could bump "xlevel" up to 0x8000001D for Intel Guest, it's meaningless and this is undefined behavior. This leaf should be considered reserved, but the SDM does not explicitly state this. So, there's no need to specifically use vendor_cpuid_only_v2 to fix anything, as it doesn't even qualify as a fix since nothing is currently broken. Therefore, it is acceptable to select the default legacy cache model based on the vendor. For the CPUID 0x8000001D leaf, in X86CPUState, a unified cache_info is enough. It only needs to be initialized and configured with the corresponding legacy cache model based on the vendor. Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711102143.1622339-18-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:22 +02:00
Zhao Liu	00fa96c96a	i386/cpu: Select legacy cache model based on vendor in CPUID 0x80000006 As preparation for merging cache_info_cpuid4 and cache_info_amd in X86CPUState, set legacy cache model based on vendor in the CPUID 0x80000006 leaf. For AMD CPU, select legacy AMD cache model (in cache_info_amd) as the default cache model like before, otherwise, select legacy Intel cache model (in cache_info_cpuid4). To ensure compatibility is not broken, add an enable_legacy_vendor_cache flag based on x-vendor-only-v2 to indicate cases where the legacy cache model should be used regardless of the vendor. For CPUID 0x80000006 leaf, enable_legacy_vendor_cache flag indicates to pick legacy Intel cache model, which is for compatibility with the behavior of PC machine v10.0 and older. The following explains how current vendor-based default legacy cache model ensures correctness without breaking compatibility. * For the PC machine v6.0 and older, vendor_cpuid_only=false, and vendor_cpuid_only_v2=false. - If the named CPU model has its own cache model, and doesn't use legacy cache model (legacy_cache=false), then cache_info_cpuid4 and cache_info_amd are same, so 0x80000006 leaf uses its own cache model regardless of the vendor. - For max/host/named CPU (without its own cache model), then the flag enable_legacy_vendor_cache is true, they will use legacy AMD cache model just like their previous behavior. * For the PC machine v10.0 and older (to v6.1), vendor_cpuid_only=true, and vendor_cpuid_only_v2=false. - No change, since this leaf doesn't aware vendor_cpuid_only. * For the PC machine v10.1 and newer, vendor_cpuid_only=true, and vendor_cpuid_only_v2=true. - If the named CPU model has its own cache model (legacy_cache=false), then cache_info_cpuid4 & cache_info_amd both equal to its own cache model, so it uses its own cache model in 0x80000006 leaf regardless of the vendor. Intel and Zhaoxin CPUs have their special encoding based on SDM, which is the expected behavior and no different from before. - For max/host/named CPU (without its own cache model), then the flag enable_legacy_vendor_cache is false, the legacy cache model is selected based on vendor. For AMD CPU, it will use legacy AMD cache as before. For non-AMD (Intel/Zhaoxin) CPU, it will use legacy Intel cache and be encoded based on SDM as expected. Here, selecting the legacy cache model based on the vendor does not change the previous (before the change) behavior. Therefore, the above analysis proves that, with the help of the flag enable_legacy_vendor_cache, it is acceptable to select the default legacy cache model based on the vendor. For the CPUID 0x80000006 leaf, in X86CPUState, a unified cache_info is enough. It only needs to be initialized and configured with the corresponding legacy cache model based on the vendor. Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711102143.1622339-17-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-07-12 15:28:22 +02:00

1 2 3 4 5 ...

2539 commits