Commit graph

2539 commits

Author SHA1 Message Date
Xiaoyao Li
b5ff08e64e i386/kvm: Drop KVM_CAP_X86_SMM check in kvm_arch_init()
x86_machine_is_smm_enabled() checks the KVM_CAP_X86_SMM for KVM
case. No need to check KVM_CAP_X86_SMM in kvm_arch_init().

So just drop the check of KVM_CAP_X86_SMM to simplify the code.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250729062014.1669578-3-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-09-17 19:01:55 +02:00
Xiaoyao Li
591f817d81 target/i386: Define enum X86ASIdx for x86's address spaces
Define X86ASIdx as enum, like ARM's ARMASIdx, so that it's clear index 0
is for memory and index 1 is for SMM.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Tested-By: Kirill Martynov <stdcalllevi@yandex-team.ru>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250730095253.1833411-3-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-09-17 19:00:58 +02:00
Xiaoyao Li
0516f4b702 i386/cpu: Enable SMM cpu address space under KVM
Kirill Martynov reported assertation in cpu_asidx_from_attrs() being hit
when x86_cpu_dump_state() is called to dump the CPU state[*]. It happens
when the CPU is in SMM and KVM emulation failure due to misbehaving
guest.

The root cause is that QEMU i386 never enables the SMM address space for
cpu since KVM SMM support has been added.

Enable the SMM cpu address space under KVM when the SMM is enabled for
the x86machine.

[*] https://lore.kernel.org/qemu-devel/20250523154431.506993-1-stdcalllevi@yandex-team.ru/

Reported-by: Kirill Martynov <stdcalllevi@yandex-team.ru>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Kirill Martynov <stdcalllevi@yandex-team.ru>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250730095253.1833411-2-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-09-17 19:00:58 +02:00
Paolo Bonzini
d5e33b5f8f accel: make all calls to qemu_process_cpu_events look the same
There is no reason for some accelerators to use qemu_process_cpu_events_common
(which is separated from qemu_process_cpu_events() specifically for round
robin TCG).  They can also check for events directly on the first pass through
the loop, instead of setting cpu->exit_request to true.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-09-17 19:00:56 +02:00
Paolo Bonzini
9a191d3782 cpus: clear exit_request in qemu_process_cpu_events
Make the code common to all accelerators: after seeing cpu->exit_request
set to true, accelerator code needs to reach qemu_process_cpu_events_common().

So for the common cases where they use qemu_process_cpu_events(), go ahead and
clear it in there.  Note that the cheap qatomic_set() is enough because
at this point the thread has taken the BQL; qatomic_set_mb() is not needed.
In particular, this is the ordering of the communication between
I/O and vCPU threads is always the same.

In the I/O thread:

(a) store other memory locations that will be checked if cpu->exit_request
    or cpu->interrupt_request is 1 (for example cpu->stop or cpu->work_list
    for cpu->exit_request)

(b) cpu_exit(): store-release cpu->exit_request, or
(b) cpu_interrupt(): store-release cpu->interrupt_request

>>> at this point, cpu->halt_cond is broadcast and the BQL released

(c) do the accelerator-specific kick (e.g. write icount_decr for TCG,
    pthread_kill for KVM, etc.)

In the vCPU thread instead the opposite order is respected:

(c) the accelerator's execution loop exits thanks to the kick

(b) then the inner execution loop checks cpu->interrupt_request
    and cpu->exit_request.  If needed cpu->interrupt_request is
    converted into cpu->exit_request when work is needed outside
    the execution loop.

(a) then the other memory locations are checked.  Some may need to
    be read under the BQL, but the vCPU thread may also take other
    locks (e.g. for queued work items) or none at all.

qatomic_set_mb() would only be needed if the halt sleep was done
outside the BQL (though in that case, cpu->exit_request probably
would be replaced by a QemuEvent or something like that).

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-09-17 19:00:56 +02:00
Paolo Bonzini
871de7078f treewide: rename qemu_wait_io_event/qemu_wait_io_event_common
Do so before extending it to the user-mode emulators, where there is no
such thing as an "I/O thread".

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-09-17 19:00:55 +02:00
Paolo Bonzini
f8217ae54e cpus: properly kick CPUs out of inner execution loop
Now that cpu_exit() actually kicks all accelerators, use it whenever
the message to another thread is processed in qemu_wait_io_event().

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-09-17 19:00:55 +02:00
Paolo Bonzini
f084ff128b accel: use atomic accesses for exit_request
CPU threads write exit_request as a "note to self" that they need to
go out to a slow path.  This write happens out of the BQL and can be
a data race with another threads' cpu_exit(); use atomic accesses
consistently.

While at it, change the source argument from int ("1") to bool ("true").

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-09-17 19:00:55 +02:00
Paolo Bonzini
ac6c8a390b accel: use store_release/load_acquire for cross-thread exit_request
Reads and writes cpu->exit_request do not use a load-acquire/store-release
pair right now, but this means that cpu_exit() may not write cpu->exit_request
after any flags that are read by the vCPU thread.

Probably everything is protected one way or the other by the BQL, because
cpu->exit_request leads to the slow path, where the CPU thread often takes
the BQL (for example, to go to sleep by waiting on the BQL-protected
cpu->halt_cond); but it's not clear, so use load-acquire/store-release
consistently.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-09-17 19:00:55 +02:00
Paolo Bonzini
602d5ebba2 treewide: clear bits of cs->interrupt_request with cpu_reset_interrupt()
Open coding cpu_reset_interrupt() can cause bugs if the BQL is not
taken, for example i386 has the call chain kvm_cpu_exec() ->
kvm_put_vcpu_events() -> kvm_arch_put_registers().

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-09-17 19:00:55 +02:00
Paolo Bonzini
3efe1a0f60 target/i386: limit a20 to system emulation
It is not used by user-mode emulation and is the only caller of
cpu_interrupt() in qemu-i386 and qemu-x86_64.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-09-17 19:00:55 +02:00
Markus Armbruster
b2e4534a2c i386/kvm/vmsr_energy: Plug memory leak on failure to connect socket
vmsr_open_socket() leaks the Error set by
qio_channel_socket_connect_sync().  Plug the leak by not creating the
Error.

Fixes: 0418f90809 (Add support for RAPL MSRs in KVM/Qemu)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20250723133257.1497640-2-armbru@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
2025-09-01 13:10:55 +02:00
Igor Mammedov
17e645c6f1 kvm: i386: irqchip: take BQL only if there is an interrupt
when kernel-irqchip=split is used, QEMU still hits BQL
contention issue when reading ACPI PM/HPET timers
(despite of timer[s] access being lock-less).

So Windows with more than 255 cpus is still not able to
boot (since it requires iommu -> split irqchip).

Problematic path is in kvm_arch_pre_run() where BQL is taken
unconditionally when split irqchip is in use.

There are a few parts that BQL protects there:
  1. interrupt check and injecting

    however we do not take BQL when checking for pending
    interrupt (even within the same function), so the patch
    takes the same approach for cpu->interrupt_request checks
    and takes BQL only if there is a job to do.

  2. request_interrupt_window access
      CPUState::kvm_run::request_interrupt_window doesn't need BQL
      as it's accessed by its own vCPU thread.

  3. cr8/cpu_get_apic_tpr access
      the same (as #2) applies to CPUState::kvm_run::cr8,
      and APIC registers are also cached/synced (get/put) within
      the vCPU thread it belongs to.

Taking BQL only when is necessary, eleminates BQL bottleneck on
IO/MMIO only exit path, improoving latency by 80% on HPET micro
benchmark.

This lets Windows to boot succesfully (in case hv-time isn't used)
when more than 255 vCPUs are in use.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20250814160600.2327672-8-imammedo@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-08-29 12:48:14 +02:00
Igor Mammedov
87511341c3 add cpu_test_interrupt()/cpu_set_interrupt() helpers and use them tree wide
The helpers form load-acquire/store-release pair and ensure
that appropriate barriers are in place in case checks happen
outside of BQL.

Use them to replace open-coded checkers/setters across the code,
to make sure that barriers are not missed.  Helpers also make code a
bit more readable.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Link: https://lore.kernel.org/r/20250821155603.2422553-1-imammedo@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-08-29 12:48:14 +02:00
Zero Tang
c12cbaa007 i386/tcg/svm: fix incorrect canonicalization
For all 32-bit systems and 64-bit Windows systems, "long" is 4 bytes long.
Due to using "long" for a linear address, svm_canonicalization would
set all high bits to 1 when (assuming 48-bit linear address) the segment
base is bigger than 0x7FFF.

This fixes booting guests under TCG when the guest IDT and GDT bases are
above 0x7FFF, thereby resulting in incorrect bases. When an interrupt
arrives, it would trigger a #PF exception; the #PF would trigger again,
resulting in a #DF exception; the #PF would trigger for the third time,
resulting in triple-fault, and eventually causes a shutdown VM-Exit to
the hypervisor right after guest boot.

Cc: qemu-stable@nongnu.org
Signed-off-by: Zero Tang <zero.tangptr@gmail.com>
2025-08-27 10:57:03 +02:00
Xin Wang
27535e9cca target/i386: Add support for save/load of exception error code
For now, qemu save/load CPU exception info(such as exception_nr and
has_error_code), while the exception error_code is ignored. This will
cause the dest hypervisor reinject a vCPU exception with error_code(0),
potentially causing a guest kernel panic.

For instance, if src VM stopped with an user-mode write #PF (error_code 6),
the dest hypervisor will reinject an #PF with error_code(0) when vCPU resume,
then guest kernel panic as:
  BUG: unable to handle page fault for address: 00007f80319cb010
  #PF: supervisor read access in user mode
  #PF: error_code(0x0000) - not-present page
  RIP: 0033:0x40115d

To fix it, support save/load exception error_code.

Signed-off-by: Xin Wang <wangxinxin.wang@huawei.com>
Link: https://lore.kernel.org/r/20250819145834.3998-1-wangxinxin.wang@huawei.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-08-20 22:47:43 +02:00
Paolo Bonzini
faaaf017d5 Revert "i386/cpu: Warn about why CPUID_EXT_PDCM is not available"
This reverts commit 00268e0002.
(The only conflict is in the !is_tdx_vm() part of the condition,
which is safe to keep).

mark_unavailable_features() actively blocks usage of the feature,
so it is a functional change, not merely a emitting warning.
The commit was intended to merely warn if PDCM was enabled when
the performance counters are not, so revert it.

Reported-by: Christian A. Ehrhardt <christian.ehrhardt@canonical.com>
Analyzed-by: Daniel P. Berrangé <berrange@redhat.com>
Analyzed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20250819150235.785559-1-pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-08-19 14:05:56 -04:00
Zhao Liu
4e5d58969e target/i386/cpu: Move addressable ID encoding out of compat property in CPUID[0x1]
Currently, the addressable ID encoding for CPUID[0x1].EBX[bits 16-23]
(Maximum number of addressable IDs for logical processors in this
physical package) is covered by vendor_cpuid_only_v2 compat property.
The previous consideration was to avoid breaking migration and this
compat property makes it unfriendly to backport the commit f985a1195b
("i386/cpu: Fix number of addressable IDs field for CPUID.01H.EBX
[23:16]").

However, NetBSD booting is broken since the commit 88dd4ca06c
("i386/cpu: Use APIC ID info to encode cache topo in CPUID[4]"),
because NetBSD calculates smt information via `lp_max` / `core_max` for
legacy Intel CPUs which doesn't support 0xb leaf, where `lp_max` is from
CPUID[0x1].EBX.bits[16-23] and `core_max` is from CPUID[0x4].0x0.bits[26
-31].

The commit 88dd4ca0 changed the encoding rule of `core_max` but didn't
update `lp_max`, so that NetBSD would get the wrong smt information,
which leads to the module loading failure.

Luckily, the commit f985a1195b ("i386/cpu: Fix number of addressable
IDs field for CPUID.01H.EBX[23:16]") updated the encoding rule for
`lp_max` and accidentally fixed the NetBSD issue too. This also shows
that using CPUID[0x1] and CPUID[0x4].0x0 to calculate HT/SMT information
is a common practice to detect CPU topology on legacy Intel CPUs.

Therefore, it's necessary to backport the commit f985a1195b to
previous stable QEMU to help address the similar issues as well. Then
the compat property is not needed any more since all stable QEMUs will
follow the same encoding way.

So, in CPUID[0x1], move addressable ID encoding out of compat property.

Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Inspired-by: Chuang Xu <xuchuangxclwt@bytedance.com>
Fixes: commit f985a1195b ("i386/cpu: Fix number of addressable IDs field for CPUID.01H.EBX[23:16]")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3061
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Message-ID: <20250804053548.1808629-1-zhao1.liu@intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-08-05 17:30:29 +02:00
Paolo Bonzini
feea87cd6b target/i386: fix width of third operand of VINSERTx128
Table A-5 of the Intel manual incorrectly lists the third operand of
VINSERTx128 as Wqq, but it is actually a 128-bit value.  This is
visible when W is a memory operand close to the end of the page.

Fixes the recently-added poly1305_kunit test in linux-next.

(No testcase yet, but I plan to modify test-avx2 to use memory
close to the end of the page.  This would work because the test
vectors correctly have the memory operand as xmm2/m128).

Reported-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: qemu-stable@nongnu.org
Fixes: 7906847768 ("target/i386: reimplement 0x0f 0x3a, add AVX", 2022-10-18)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-25 14:51:11 +02:00
Xiaoyao Li
f64832033d i386/tdx: Remove the redundant qemu_mutex_init(&tdx->lock)
Commit 40da501d89 ("i386/tdx: handle TDG.VP.VMCALL<GetQuote>") added
redundant qemu_mutex_init(&tdx->lock) in tdx_guest_init by mistake.

Fix it by removing the redundant one.

Fixes: 40da501d89 ("i386/tdx: handle TDG.VP.VMCALL<GetQuote>")
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Link: https://lore.kernel.org/r/20250717103707.688929-1-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-17 17:18:59 +02:00
Xiaoyao Li
5fe6b9a854 i386/cpu: Cleanup host_cpu_max_instance_init()
The implementation of host_cpu_max_instance_init() was merged into
host_cpu_instance_init() by commit 29f1ba338b ("target/i386: merge
host_cpu_instance_init() and host_cpu_max_instance_init()"), while the
declaration of it remains in host-cpu.h.

Clean it up.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250716063117.602050-1-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-17 17:18:59 +02:00
Paolo Bonzini
f2b7879763 target/i386: tdx: fix locking for interrupt injection
Take tdx_guest->lock when injecting the event notification interrupt into
the guest.

Fixes CID 1612364.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-17 17:18:59 +02:00
Zhao Liu
e52af92e9e i386/cpu: Move x86_ext_save_areas[] initialization to .instance_init
In x86_cpu_post_initfn(), the initialization of x86_ext_save_areas[]
marks the unsupported xsave areas based on Host support.

This step must be done before accel_cpu_instance_init(), otherwise,
KVM's assertion on host xsave support would fail:

qemu-system-x86_64: ../target/i386/kvm/kvm-cpu.c:149:
kvm_cpu_xsave_init: Assertion `esa->size == eax' failed.

(on AMD EPYC 7302 16-Core Processor)

Move x86_ext_save_areas[] initialization to .instance_init and place it
before accel_cpu_instance_init().

Fixes: commit 5f158abef4 ("target/i386: move accel_cpu_instance_init to .instance_init")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250717023933.2502109-1-zhao1.liu@intel.com
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-17 15:51:29 +02:00
Paolo Bonzini
d3a24134e3 target/i386: do not expose ARCH_CAPABILITIES on AMD CPU
KVM emulates the ARCH_CAPABILITIES on x86 for both Intel and AMD
cpus, although the IA32_ARCH_CAPABILITIES MSR is an Intel-specific
MSR and it makes no sense to emulate it on AMD.

As a consequence, VMs created on AMD with qemu -cpu host and using
KVM will advertise the ARCH_CAPABILITIES feature and provide the
IA32_ARCH_CAPABILITIES MSR. This can cause issues (like Windows BSOD)
as the guest OS might not expect this MSR to exist on such cpus (the
AMD documentation specifies that ARCH_CAPABILITIES feature and MSR
are not defined on the AMD architecture).

A fix was proposed in KVM code, however KVM maintainers don't want to
change this behavior that exists for 6+ years and suggest changes to be
done in QEMU instead.  Therefore, hide the bit from "-cpu host":
migration of -cpu host guests is only possible between identical host
kernel and QEMU versions, therefore this is not a problematic breakage.

If a future AMD machine does include the MSR, that would re-expose the
Windows guest bug; but it would not be KVM/QEMU's problem at that
point, as we'd be following a genuine physical CPU impl.

Reported-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-17 15:50:45 +02:00
Stefan Hajnoczi
f96b157ebb Accelerators patches
- Unify x86/arm hw/xen/arch_hvm.h header
 - Move non-system-specific 'accel/accel-ops.h' and 'accel-cpu-ops.h' to accel/
 - Move KVM definitions qapi/accelerator.json
 - Add @qom-type field to CpuInfoFast QAPI structure
 - Display CPU model name in 'info cpus' HMP command
 - Introduce @x-accel-stats QMP command
 - Add 'info accel' on HMP
 - Improve qemu_add_vm_change_state_handler*() docstring
 - Extract TCG statistic related code to tcg-stats.c
 - Implement AccelClass::get_[vcpu]_stats() handlers for TCG and HVF
 - Do not dump NaN in TCG statistics
 - Revert incomplete "accel/tcg: Unregister the RCU before exiting RR thread"
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmh2r4UACgkQ4+MsLN6t
 wN5i6xAAkOvwFh1GmsPUdz5RxzsWoIUDvyENg6E8Axwe5tSEMRFiPjabbTQJomQg
 GZt75XIS24LZFZ+hvqrLSA+dFgXTgWv08ZE81EjwjmAMBlLCOPhCgeN6C1p8100Y
 scSvRJbP9k9lpA5K7et/1X4AkK2cZyh+LGJgCjr2Al2mbERpPueDF8fxqeohFvXQ
 nTSks4XlA0yQ06+9r49aQAiuXvgg9lDT1wIglD2HEV7vOVs/ud+yyL8+z5YMeFzx
 pSIc6wDu4PqdA46w4MZs90uTy7S/PMvBiYDEiV3tKzg0MLttvFGlT58/YjVtguTP
 mNkfwIEwQtDQzoxsFIJO7yBTlTRBs95V4aIVk3pB+Gb/bideRPIkeVQvgMSEBKj7
 N0pEXWOxfB9iIWO6b1utYpQ4uxeDOU/8DPUCit1IBbNgKTaJkJb77fboYk7NaB0K
 KEtObAk6jMatB/xr+vUFWc4sMk9wlm72w8wcQzgKZ0xV2U3d1/Y/9nS4GvI510ev
 TRQ3mKj7N319uCeId1czF6W8rillCJ2u8ZK53u+Nfp7R3PbsRSMc6IDJ1UdDUlyR
 HFcWHxbcbEGhe8SnFGab4Qd6fWChcn2EaEoAJJz+Rqv0k3zcwqccNM5waCABAjTE
 0S22JIHePJKcpkMLGq3EOUAQuu+8Zsol7gPCLxSAMclVqPTl9ck=
 =rAav
 -----END PGP SIGNATURE-----

Merge tag 'accel-20250715' of https://github.com/philmd/qemu into staging

Accelerators patches

- Unify x86/arm hw/xen/arch_hvm.h header
- Move non-system-specific 'accel/accel-ops.h' and 'accel-cpu-ops.h' to accel/
- Move KVM definitions qapi/accelerator.json
- Add @qom-type field to CpuInfoFast QAPI structure
- Display CPU model name in 'info cpus' HMP command
- Introduce @x-accel-stats QMP command
- Add 'info accel' on HMP
- Improve qemu_add_vm_change_state_handler*() docstring
- Extract TCG statistic related code to tcg-stats.c
- Implement AccelClass::get_[vcpu]_stats() handlers for TCG and HVF
- Do not dump NaN in TCG statistics
- Revert incomplete "accel/tcg: Unregister the RCU before exiting RR thread"

 # -----BEGIN PGP SIGNATURE-----
 #
 # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmh2r4UACgkQ4+MsLN6t
 # wN5i6xAAkOvwFh1GmsPUdz5RxzsWoIUDvyENg6E8Axwe5tSEMRFiPjabbTQJomQg
 # GZt75XIS24LZFZ+hvqrLSA+dFgXTgWv08ZE81EjwjmAMBlLCOPhCgeN6C1p8100Y
 # scSvRJbP9k9lpA5K7et/1X4AkK2cZyh+LGJgCjr2Al2mbERpPueDF8fxqeohFvXQ
 # nTSks4XlA0yQ06+9r49aQAiuXvgg9lDT1wIglD2HEV7vOVs/ud+yyL8+z5YMeFzx
 # pSIc6wDu4PqdA46w4MZs90uTy7S/PMvBiYDEiV3tKzg0MLttvFGlT58/YjVtguTP
 # mNkfwIEwQtDQzoxsFIJO7yBTlTRBs95V4aIVk3pB+Gb/bideRPIkeVQvgMSEBKj7
 # N0pEXWOxfB9iIWO6b1utYpQ4uxeDOU/8DPUCit1IBbNgKTaJkJb77fboYk7NaB0K
 # KEtObAk6jMatB/xr+vUFWc4sMk9wlm72w8wcQzgKZ0xV2U3d1/Y/9nS4GvI510ev
 # TRQ3mKj7N319uCeId1czF6W8rillCJ2u8ZK53u+Nfp7R3PbsRSMc6IDJ1UdDUlyR
 # HFcWHxbcbEGhe8SnFGab4Qd6fWChcn2EaEoAJJz+Rqv0k3zcwqccNM5waCABAjTE
 # 0S22JIHePJKcpkMLGq3EOUAQuu+8Zsol7gPCLxSAMclVqPTl9ck=
 # =rAav
 # -----END PGP SIGNATURE-----
 # gpg: Signature made Tue 15 Jul 2025 15:44:05 EDT
 # gpg:                using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
 # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full]
 # Primary key fingerprint: FAAB E75E 1291 7221 DCFD  6BB2 E3E3 2C2C DEAD C0DE

* tag 'accel-20250715' of https://github.com/philmd/qemu:
  system/runstate: Document qemu_add_vm_change_state_handler_prio* in hdr
  system/runstate: Document qemu_add_vm_change_state_handler()
  accel/hvf: Implement AccelClass::get_vcpu_stats() handler
  accel/tcg: Implement AccelClass::get_stats() handler
  accel/tcg: Propagate AccelState to dump_accel_info()
  accel/system: Add 'info accel' on human monitor
  accel/system: Introduce @x-accel-stats QMP command
  accel/tcg: Extract statistic related code to tcg-stats.c
  Revert "accel/tcg: Unregister the RCU before exiting RR thread"
  accel: Extract AccelClass definition to 'accel/accel-ops.h'
  accel: Rename 'system/accel-ops.h' -> 'accel/accel-cpu-ops.h'
  accel/tcg: Do not dump NaN statistics
  hw/core/machine: Display CPU model name in 'info cpus' command
  qapi/machine: Add @qom-type field to CpuInfoFast structure
  qapi/accel: Move definitions related to accelerators in their own file
  hw/arm/xen-pvh: Remove unnecessary 'hw/xen/arch_hvm.h' header
  hw/xen/arch_hvm: Unify x86 and ARM variants

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Conflicts:
  qapi/machine.json
  Commit 0462da9d6b ("qapi: remove trivial "Returns:" sections")
  removed trivial "Returns:". This caused a conflict with the move from
  machine.json to accelerator.json.
2025-07-16 07:13:40 -04:00
Stefan Hajnoczi
e452053097 virtio,pci,pc: features, fixes, tests
SPCR acpi table can now be disabled
 vhost-vdpa can now report hashing capability to guest
 PPTT acpi table now tells guest vCPUs are identical
 vost-user-blk now shuts down faster
 loongarch64 now supports bios-tables-test
 intel_iommu now supports ATS
 cxl now supports DCD Fabric Management Command Set
 arm now supports acpi pci hotplug
 
 fixes, cleanups
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmh1+7APHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpcZ8H/2udpCZ49vjPB8IwQAGdFTw2TWVdxUQFHexQ
 pOsCGyFBNAXqD1bmb8lwWyYVJ08WELyL6xWsQ5tfVPiXpKYYHPHl4rNr/SPoyNcv
 joY++tagudmOki2DU7nfJ+rPIIuigOTUHbv4TZciwcHle6f65s0iKXhR1sL0cj4i
 TS6iJlApSuJInrBBUxuxSUomXk79mFTNKRiXj1k58LRw6JOUEgYvtIW8i+mOUcTg
 h1dZphxEQr/oG+a2pM8GOVJ1AFaBPSfgEnRM4kTX9QuTIDCeMAKUBo/mwOk6PV7z
 ZhSrDPLrea27XKGL++EJm0fFJ/AsHF1dTks2+c0rDrSK+UV87Zc=
 =sktm
 -----END PGP SIGNATURE-----

Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging

virtio,pci,pc: features, fixes, tests

SPCR acpi table can now be disabled
vhost-vdpa can now report hashing capability to guest
PPTT acpi table now tells guest vCPUs are identical
vost-user-blk now shuts down faster
loongarch64 now supports bios-tables-test
intel_iommu now supports ATS
cxl now supports DCD Fabric Management Command Set
arm now supports acpi pci hotplug

fixes, cleanups

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

 # -----BEGIN PGP SIGNATURE-----
 #
 # iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmh1+7APHG1zdEByZWRo
 # YXQuY29tAAoJECgfDbjSjVRpcZ8H/2udpCZ49vjPB8IwQAGdFTw2TWVdxUQFHexQ
 # pOsCGyFBNAXqD1bmb8lwWyYVJ08WELyL6xWsQ5tfVPiXpKYYHPHl4rNr/SPoyNcv
 # joY++tagudmOki2DU7nfJ+rPIIuigOTUHbv4TZciwcHle6f65s0iKXhR1sL0cj4i
 # TS6iJlApSuJInrBBUxuxSUomXk79mFTNKRiXj1k58LRw6JOUEgYvtIW8i+mOUcTg
 # h1dZphxEQr/oG+a2pM8GOVJ1AFaBPSfgEnRM4kTX9QuTIDCeMAKUBo/mwOk6PV7z
 # ZhSrDPLrea27XKGL++EJm0fFJ/AsHF1dTks2+c0rDrSK+UV87Zc=
 # =sktm
 # -----END PGP SIGNATURE-----
 # gpg: Signature made Tue 15 Jul 2025 02:56:48 EDT
 # gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
 # gpg:                issuer "mst@redhat.com"
 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
 # gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
 # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
 #      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (97 commits)
  hw/cxl: mailbox-utils: 0x5605 - FMAPI Initiate DC Release
  hw/cxl: mailbox-utils: 0x5604 - FMAPI Initiate DC Add
  hw/cxl: Create helper function to create DC Event Records from extents
  hw/cxl: mailbox-utils: 0x5603 - FMAPI Get DC Region Extent Lists
  hw/cxl: mailbox-utils: 0x5602 - FMAPI Set DC Region Config
  hw/mem: cxl_type3: Add DC Region bitmap lock
  hw/cxl: Move definition for dynamic_capacity_uuid and enum for DC event types to header
  hw/cxl: mailbox-utils: 0x5601 - FMAPI Get Host Region Config
  hw/mem: cxl_type3: Add dsmas_flags to CXLDCRegion struct
  hw/cxl: mailbox-utils: 0x5600 - FMAPI Get DCD Info
  hw/cxl: fix DC extent capacity tracking
  tests: virt: Update expected ACPI tables for virt test
  hw/acpi/aml-build: Build a root node in the PPTT table
  hw/acpi/aml-build: Set identical implementation flag for PPTT processor nodes
  tests: virt: Allow changes to PPTT test table
  qtest/bios-tables-test: Generate reference blob for DSDT.acpipcihp
  qtest/bios-tables-test: Generate reference blob for DSDT.hpoffacpiindex
  tests/qtest/bios-tables-test: Add aarch64 ACPI PCI hotplug test
  tests/qtest/bios-tables-test: Prepare for addition of acpi pci hp tests
  hw/arm/virt: Let virt support pci hotplug/unplug GED event
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Conflicts:
  net/vhost-vdpa.c
  vhost_vdpa_set_steering_ebpf() was removed, resolve the context
  conflict.
2025-07-16 07:00:47 -04:00
Philippe Mathieu-Daudé
f7a7e7dd21 accel: Extract AccelClass definition to 'accel/accel-ops.h'
Only accelerator implementations (and the common accelator
code) need to know about AccelClass internals. Move the
definition out but forward declare AccelState and AccelClass.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250703173248.44995-39-philmd@linaro.org>
2025-07-15 19:34:33 +02:00
Philippe Mathieu-Daudé
05927e9dc9 accel: Rename 'system/accel-ops.h' -> 'accel/accel-cpu-ops.h'
Unfortunately "system/accel-ops.h" handlers are not only
system-specific. For example, the cpu_reset_hold() hook
is part of the vCPU creation, after it is realized.

Mechanical rename to drop 'system' using:

  $ sed -i -e s_system/accel-ops.h_accel/accel-cpu-ops.h_g \
              $(git grep -l system/accel-ops.h)

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250703173248.44995-38-philmd@linaro.org>
2025-07-15 19:34:33 +02:00
Philippe Mathieu-Daudé
0f64fb6743 qemu: Declare all load/store helper in 'qemu/bswap.h'
Restrict "exec/tswap.h" to the tswap*() methods,
move the load/store helpers with the other ones
declared in "qemu/bswap.h".

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-Id: <20250708215320.70426-8-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-07-15 02:56:39 -04:00
Zhao Liu
5d21ee453a i386/cpu: Honor maximum value for CPUID.8000001DH.EAX[25:14]
CPUID.8000001DH:EAX[25:14] is "NumSharingCache", and the number of
logical processors sharing this cache is the value of this field
incremented by 1. Because of its width limitation, the maximum value
currently supported is 4095.

Though at present Q35 supports up to 4096 CPUs, by constructing a
specific topology, the width of the APIC ID can be extended beyond 12
bits. For example, using `-smp threads=33,cores=9,modules=9` results in
a die level offset of 6 + 4 + 4 = 14 bits, which can also cause
overflow. Check and honor the maximum value as CPUID.04H did.

Cc: Babu Moger <babu.moger@amd.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250714080859.1960104-8-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-14 10:29:17 +02:00
Qian Wen
3e86124e7c i386/cpu: Fix overflow of cache topology fields in CPUID.04H
According to SDM, CPUID.0x4:EAX[31:26] indicates the Maximum number of
addressable IDs for processor cores in the physical package. If we
launch over 64 cores VM, the 6-bit field will overflow, and the wrong
core_id number will be reported.

Since the HW reports 0x3f when the intel processor has over 64 cores,
limit the max value written to EAX[31:26] to 63, so max num_cores should
be 64.

For EAX[14:25], though at present Q35 supports up to 4096 CPUs, by
constructing a specific topology, the width of the APIC ID can be
extended beyond 12 bits. For example, using `-smp threads=33,cores=9,
modules=9` results in a die level offset of 6 + 4 + 4 = 14 bits, which
can also cause overflow.  check and honor the maximum value for
EAX[14:25] as well.

In addition, for host-cache-info case, also apply the same checks and
fixes.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Qian Wen <qian.wen@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250714080859.1960104-7-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-14 10:29:17 +02:00
Qian Wen
a62fef5829 i386/cpu: Fix cpu number overflow in CPUID.01H.EBX[23:16]
The legacy topology enumerated by CPUID.1.EBX[23:16] is defined in SDM
Vol2:

Bits 23-16: Maximum number of addressable IDs for logical processors in
this physical package.

When threads_per_socket > 255, it will 1) overwrite bits[31:24] which is
apic_id, 2) bits [23:16] get truncated.

Specifically, if launching the VM with -smp 256, the value written to
EBX[23:16] is 0 because of data overflow. If the guest only supports
legacy topology, without V2 Extended Topology enumerated by CPUID.0x1f
or Extended Topology enumerated by CPUID.0x0b to support over 255 CPUs,
the return of the kernel invoking cpu_smt_allowed() is false and APs
(application processors) will fail to bring up. Then only CPU 0 is online,
and others are offline.

For example, launch VM via:
qemu-system-x86_64 -M q35,accel=kvm,kernel-irqchip=split \
    -cpu qemu64,cpuid-0xb=off -smp 256 -m 32G \
    -drive file=guest.img,if=none,id=virtio-disk0,format=raw \
    -device virtio-blk-pci,drive=virtio-disk0,bootindex=1 --nographic

The guest shows:
    CPU(s):               256
    On-line CPU(s) list:  0
    Off-line CPU(s) list: 1-255

To avoid this issue caused by overflow, limit the max value written to
EBX[23:16] to 255 as the HW does.

Cc: qemu-stable@nongnu.org
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Qian Wen <qian.wen@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250714080859.1960104-6-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-14 10:29:17 +02:00
Chuang Xu
f985a1195b i386/cpu: Fix number of addressable IDs field for CPUID.01H.EBX[23:16]
When QEMU is started with:
-cpu host,migratable=on,host-cache-info=on,l3-cache=off
-smp 180,sockets=2,dies=1,cores=45,threads=2

On Intel platform:
CPUID.01H.EBX[23:16] is defined as "max number of addressable IDs for
logical processors in the physical package".

When executing "cpuid -1 -l 1 -r" in the guest, we obtain a value of 90 for
CPUID.01H.EBX[23:16], whereas the expected value is 128. Additionally,
executing "cpuid -1 -l 4 -r" in the guest yields a value of 63 for
CPUID.04H.EAX[31:26], which matches the expected result.

As (1+CPUID.04H.EAX[31:26]) rounds up to the nearest power-of-2 integer,
it's necessary to round up CPUID.01H.EBX[23:16] to the nearest power-of-2
integer too. Otherwise there would be unexpected results in guest with
older kernel.

For example, when QEMU is started with CLI above and xtopology is disabled,
guest kernel 5.15.120 uses CPUID.01H.EBX[23:16]/(1+CPUID.04H.EAX[31:26]) to
calculate threads-per-core in detect_ht(). Then guest will get "90/(1+63)=1"
as the result, even though threads-per-core should actually be 2.

And on AMD platform:
CPUID.01H.EBX[23:16] is defined as "Logical processor count". Current
result meets our expectation.

So round up CPUID.01H.EBX[23:16] to the nearest power-of-2 integer only
for Intel platform to solve the unexpected result.

Use the "x-vendor-cpuid-only-v2" compat option to fix this issue.

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Guixiong Wei <weiguixiong@bytedance.com>
Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
Signed-off-by: Chuang Xu <xuchuangxclwt@bytedance.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250714080859.1960104-5-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-14 10:29:17 +02:00
Zhao Liu
075e91a4a4 i386/cpu: Reorder CPUID leaves in cpu_x86_cpuid()
Sort the CPUID leaves strictly by index to facilitate checking and
changing.

Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20250627035129.2755537-5-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-14 10:29:17 +02:00
Zhao Liu
da84c01154 i386/cpu: Mark CPUID 0x80000008 ECX bits[0:7] & [12:15] as reserved for Intel/Zhaoxin
Per SDM,

80000008H EAX Linear/Physical Address size.
              Bits 07-00: #Physical Address Bits*.
              Bits 15-08: #Linear Address Bits.
              Bits 31-16: Reserved = 0.
          EBX Bits 08-00: Reserved = 0.
              Bit 09: WBNOINVD is available if 1.
              Bits 31-10: Reserved = 0.
          ECX Reserved = 0.
          EDX Reserved = 0.

ECX/EDX in CPUID 0x80000008 leaf are reserved.

Currently, in QEMU, only ECX bits[0:7] and ECX bits[12:15] are encoded,
and both are emulated in QEMU.

Considering that Intel and Zhaoxin are already using the 0x1f leaf to
describe CPU topology, which includes similar information, Intel and
Zhaoxin will not implement ECX bits[0:7] and bits[12:15] of 0x80000008.

Therefore, mark these two fields as reserved and clear them for Intel
and Zhaoxin guests.

Reviewed-by: Tao Su <tao1.su@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250714080859.1960104-3-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-14 10:29:12 +02:00
Zhao Liu
1c52c470ba i386/cpu: Mark CPUID 0x80000007[EBX] as reserved for Intel
Per SDM,

80000007H EAX Reserved = 0.
          EBX Reserved = 0.
          ECX Reserved = 0.
          EDX Bits 07-00: Reserved = 0.
              Bit 08: Invariant TSC available if 1.
              Bits 31-09: Reserved = 0.

EAX/EBX/ECX in CPUID 0x80000007 leaf are reserved for Intel.

At present, EAX is reserved for AMD, too. And AMD hasn't used ECX in
QEMU. So these 2 registers are both left as 0.

Therefore, only fix the EBX and excode it as 0 for Intel.

Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20250627035129.2755537-3-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-14 10:27:07 +02:00
Zhao Liu
a539cd2614 i386/cpu: Mark EBX/ECX/EDX in CPUID 0x80000000 leaf as reserved for Intel
Per SDM,

80000000H EAX Maximum Input Value for Extended Function CPUID Information.
          EBX Reserved.
          ECX Reserved.
          EDX Reserved.

EBX/ECX/EDX in CPUID 0x80000000 leaf are reserved. Intel is using 0x0
leaf to encode vendor.

Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20250627035129.2755537-2-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-14 10:27:07 +02:00
Zhao Liu
8113b7f0e6 i386/cpu: Enable 0x1f leaf for YongFeng by default
Host YongFeng CPU has 0x1f leaf by default, so that enable it for
Guest CPU by default as well.

Suggested-by: Ewan Hai <ewanhai-oc@zhaoxin.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250711104603.1634832-10-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-12 15:28:22 +02:00
Zhao Liu
1bf465f3e0 i386/cpu: Enable 0x1f leaf for SapphireRapids by default
Host SapphireRapids CPU has 0x1f leaf by default, so that enable it for
Guest CPU by default as well.

Suggested-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250711104603.1634832-9-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-12 15:28:22 +02:00
Zhao Liu
5dbdcdce06 i386/cpu: Enable 0x1f leaf for GraniteRapids by default
Host GraniteRapids CPU has 0x1f leaf by default, so that enable it for
Guest CPU by default as well.

Suggested-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250711104603.1634832-8-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-12 15:28:22 +02:00
Zhao Liu
af62bd3db0 i386/cpu: Enable 0x1f leaf for SierraForest by default
Host SierraForest CPU has 0x1f leaf by default, so that enable it for
Guest CPU by default as well.

Suggested-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250711104603.1634832-7-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-12 15:28:22 +02:00
Zhao Liu
e468c5e444 i386/cpu: Enable 0x1f leaf for SierraForest by default
Host SierraForest CPU has 0x1f leaf by default, so that enable it for
Guest CPU by default as well.

Suggested-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250711104603.1634832-7-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-12 15:28:22 +02:00
Manish Mishra
bf40325614 i386/cpu: Add a "x-force-cpuid-0x1f" property
Add a "x-force-cpuid-0x1f" property so that CPU models can enable it and
have 0x1f CPUID leaf natually as the Host CPU.

The advantage is that when the CPU model's cache model is already
consistent with the Host CPU, for example, SRF defaults to l2 per
module & l3 per package, 0x1f can better help users identify the
topology in the VM.

Adding 0x1f for specific CPU models should not cause any trouble in
principle. This property is only enabled for CPU models that already
have 0x1f leaf on the Host, so software that originally runs normally on
the Host won't encounter issues in the Guest with corresponding CPU
model. Conversely, some software that relies on checking 0x1f might
have problems in the Guest due to the lack of 0x1f [*]. In
summary, adding 0x1f is also intended to further emulate the Host CPU
environment.

[*]: https://lore.kernel.org/qemu-devel/PH0PR02MB738410511BF51B12DB09BE6CF6AC2@PH0PR02MB7384.namprd02.prod.outlook.com/

Signed-off-by: Manish Mishra <manish.mishra@nutanix.com>
Co-authored-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
[Integrated and rebased 2 previous patches (ordered by post time)]
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250711104603.1634832-6-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-12 15:28:22 +02:00
Ewan Hai
b1a3a090b2 i386/cpu: Introduce cache model for YongFeng
Add the cache model to YongFeng (v3) to better emulate its
environment.

Note, although YongFeng v2 was added after v10.0, it was also back
ported to v10.0.2. Therefore, the new version (v3) is needed to avoid
conflict.

The cache model is as follows:

      --- cache 0 ---
      cache type                         = data cache (1)
      cache level                        = 0x1 (1)
      self-initializing cache level      = true
      fully associative cache            = false
      maximum IDs for CPUs sharing cache = 0x0 (0)
      maximum IDs for cores in pkg       = 0x0 (0)
      system coherency line size         = 0x40 (64)
      physical line partitions           = 0x1 (1)
      ways of associativity              = 0x8 (8)
      number of sets                     = 0x40 (64)
      WBINVD/INVD acts on lower caches   = false
      inclusive to lower caches          = false
      complex cache indexing             = false
      number of sets (s)                 = 64
      (size synth)                       = 32768 (32 KB)
      --- cache 1 ---
      cache type                         = instruction cache (2)
      cache level                        = 0x1 (1)
      self-initializing cache level      = true
      fully associative cache            = false
      maximum IDs for CPUs sharing cache = 0x0 (0)
      maximum IDs for cores in pkg       = 0x0 (0)
      system coherency line size         = 0x40 (64)
      physical line partitions           = 0x1 (1)
      ways of associativity              = 0x10 (16)
      number of sets                     = 0x40 (64)
      WBINVD/INVD acts on lower caches   = false
      inclusive to lower caches          = false
      complex cache indexing             = false
      number of sets (s)                 = 64
      (size synth)                       = 65536 (64 KB)
      --- cache 2 ---
      cache type                         = unified cache (3)
      cache level                        = 0x2 (2)
      self-initializing cache level      = true
      fully associative cache            = false
      maximum IDs for CPUs sharing cache = 0x0 (0)
      maximum IDs for cores in pkg       = 0x0 (0)
      system coherency line size         = 0x40 (64)
      physical line partitions           = 0x1 (1)
      ways of associativity              = 0x8 (8)
      number of sets                     = 0x200 (512)
      WBINVD/INVD acts on lower caches   = false
      inclusive to lower caches          = true
      complex cache indexing             = false
      number of sets (s)                 = 512
      (size synth)                       = 262144 (256 KB)
      --- cache 3 ---
      cache type                         = unified cache (3)
      cache level                        = 0x3 (3)
      self-initializing cache level      = true
      fully associative cache            = false
      maximum IDs for CPUs sharing cache = 0x0 (0)
      maximum IDs for cores in pkg       = 0x0 (0)
      system coherency line size         = 0x40 (64)
      physical line partitions           = 0x1 (1)
      ways of associativity              = 0x10 (16)
      number of sets                     = 0x2000 (8192)
      WBINVD/INVD acts on lower caches   = true
      inclusive to lower caches          = true
      complex cache indexing             = false
      number of sets (s)                 = 8192
      (size synth)                       = 8388608 (8 MB)
      --- cache 4 ---
      cache type                         = no more caches (0)

Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Ewan Hai <ewanhai-oc@zhaoxin.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250711104603.1634832-5-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-12 15:28:22 +02:00
Zhao Liu
8d69fc2158 i386/cpu: Introduce cache model for SapphireRapids
Add the cache model to SapphireRapids (v4) to better emulate its
environment.

The cache model is based on SapphireRapids-SP (Scalable Performance):

      --- cache 0 ---
      cache type                         = data cache (1)
      cache level                        = 0x1 (1)
      self-initializing cache level      = true
      fully associative cache            = false
      maximum IDs for CPUs sharing cache = 0x1 (1)
      maximum IDs for cores in pkg       = 0x3f (63)
      system coherency line size         = 0x40 (64)
      physical line partitions           = 0x1 (1)
      ways of associativity              = 0xc (12)
      number of sets                     = 0x40 (64)
      WBINVD/INVD acts on lower caches   = false
      inclusive to lower caches          = false
      complex cache indexing             = false
      number of sets (s)                 = 64
      (size synth)                       = 49152 (48 KB)
      --- cache 1 ---
      cache type                         = instruction cache (2)
      cache level                        = 0x1 (1)
      self-initializing cache level      = true
      fully associative cache            = false
      maximum IDs for CPUs sharing cache = 0x1 (1)
      maximum IDs for cores in pkg       = 0x3f (63)
      system coherency line size         = 0x40 (64)
      physical line partitions           = 0x1 (1)
      ways of associativity              = 0x8 (8)
      number of sets                     = 0x40 (64)
      WBINVD/INVD acts on lower caches   = false
      inclusive to lower caches          = false
      complex cache indexing             = false
      number of sets (s)                 = 64
      (size synth)                       = 32768 (32 KB)
      --- cache 2 ---
      cache type                         = unified cache (3)
      cache level                        = 0x2 (2)
      self-initializing cache level      = true
      fully associative cache            = false
      maximum IDs for CPUs sharing cache = 0x1 (1)
      maximum IDs for cores in pkg       = 0x3f (63)
      system coherency line size         = 0x40 (64)
      physical line partitions           = 0x1 (1)
      ways of associativity              = 0x10 (16)
      number of sets                     = 0x800 (2048)
      WBINVD/INVD acts on lower caches   = false
      inclusive to lower caches          = false
      complex cache indexing             = false
      number of sets (s)                 = 2048
      (size synth)                       = 2097152 (2 MB)
      --- cache 3 ---
      cache type                         = unified cache (3)
      cache level                        = 0x3 (3)
      self-initializing cache level      = true
      fully associative cache            = false
      maximum IDs for CPUs sharing cache = 0x7f (127)
      maximum IDs for cores in pkg       = 0x3f (63)
      system coherency line size         = 0x40 (64)
      physical line partitions           = 0x1 (1)
      ways of associativity              = 0xf (15)
      number of sets                     = 0x10000 (65536)
      WBINVD/INVD acts on lower caches   = false
      inclusive to lower caches          = false
      complex cache indexing             = true
      number of sets (s)                 = 65536
      (size synth)                       = 62914560 (60 MB)
      --- cache 4 ---
      cache type                         = no more caches (0)

Suggested-by: Tejus GK <tejus.gk@nutanix.com>
Suggested-by: Jason Zeng <jason.zeng@intel.com>
Suggested-by: "Daniel P . Berrangé" <berrange@redhat.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Tao Su <tao1.su@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250711104603.1634832-4-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-12 15:28:22 +02:00
Zhao Liu
23fb57838d i386/cpu: Introduce cache model for GraniteRapids
Add the cache model to GraniteRapids (v3) to better emulate its
environment.

The cache model is based on GraniteRapids-SP (Scalable Performance):

      --- cache 0 ---
      cache type                         = data cache (1)
      cache level                        = 0x1 (1)
      self-initializing cache level      = true
      fully associative cache            = false
      maximum IDs for CPUs sharing cache = 0x1 (1)
      maximum IDs for cores in pkg       = 0x3f (63)
      system coherency line size         = 0x40 (64)
      physical line partitions           = 0x1 (1)
      ways of associativity              = 0xc (12)
      number of sets                     = 0x40 (64)
      WBINVD/INVD acts on lower caches   = false
      inclusive to lower caches          = false
      complex cache indexing             = false
      number of sets (s)                 = 64
      (size synth)                       = 49152 (48 KB)
      --- cache 1 ---
      cache type                         = instruction cache (2)
      cache level                        = 0x1 (1)
      self-initializing cache level      = true
      fully associative cache            = false
      maximum IDs for CPUs sharing cache = 0x1 (1)
      maximum IDs for cores in pkg       = 0x3f (63)
      system coherency line size         = 0x40 (64)
      physical line partitions           = 0x1 (1)
      ways of associativity              = 0x10 (16)
      number of sets                     = 0x40 (64)
      WBINVD/INVD acts on lower caches   = false
      inclusive to lower caches          = false
      complex cache indexing             = false
      number of sets (s)                 = 64
      (size synth)                       = 65536 (64 KB)
      --- cache 2 ---
      cache type                         = unified cache (3)
      cache level                        = 0x2 (2)
      self-initializing cache level      = true
      fully associative cache            = false
      maximum IDs for CPUs sharing cache = 0x1 (1)
      maximum IDs for cores in pkg       = 0x3f (63)
      system coherency line size         = 0x40 (64)
      physical line partitions           = 0x1 (1)
      ways of associativity              = 0x10 (16)
      number of sets                     = 0x800 (2048)
      WBINVD/INVD acts on lower caches   = false
      inclusive to lower caches          = false
      complex cache indexing             = false
      number of sets (s)                 = 2048
      (size synth)                       = 2097152 (2 MB)
      --- cache 3 ---
      cache type                         = unified cache (3)
      cache level                        = 0x3 (3)
      self-initializing cache level      = true
      fully associative cache            = false
      maximum IDs for CPUs sharing cache = 0xff (255)
      maximum IDs for cores in pkg       = 0x3f (63)
      system coherency line size         = 0x40 (64)
      physical line partitions           = 0x1 (1)
      ways of associativity              = 0x10 (16)
      number of sets                     = 0x48000 (294912)
      WBINVD/INVD acts on lower caches   = false
      inclusive to lower caches          = false
      complex cache indexing             = true
      number of sets (s)                 = 294912
      (size synth)                       = 301989888 (288 MB)
      --- cache 4 ---
      cache type                         = no more caches (0)

Suggested-by: Tejus GK <tejus.gk@nutanix.com>
Suggested-by: Jason Zeng <jason.zeng@intel.com>
Suggested-by: "Daniel P . Berrangé" <berrange@redhat.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Tao Su <tao1.su@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250711104603.1634832-3-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-12 15:28:22 +02:00
Zhao Liu
1f0a9ce6c0 i386/cpu: Introduce cache model for SierraForest
Add the cache model to SierraForest (v3) to better emulate its
environment.

The cache model is based on SierraForest-SP (Scalable Performance):

      --- cache 0 ---
      cache type                         = data cache (1)
      cache level                        = 0x1 (1)
      self-initializing cache level      = true
      fully associative cache            = false
      maximum IDs for CPUs sharing cache = 0x0 (0)
      maximum IDs for cores in pkg       = 0x3f (63)
      system coherency line size         = 0x40 (64)
      physical line partitions           = 0x1 (1)
      ways of associativity              = 0x8 (8)
      number of sets                     = 0x40 (64)
      WBINVD/INVD acts on lower caches   = false
      inclusive to lower caches          = false
      complex cache indexing             = false
      number of sets (s)                 = 64
      (size synth)                       = 32768 (32 KB)
      --- cache 1 ---
      cache type                         = instruction cache (2)
      cache level                        = 0x1 (1)
      self-initializing cache level      = true
      fully associative cache            = false
      maximum IDs for CPUs sharing cache = 0x0 (0)
      maximum IDs for cores in pkg       = 0x3f (63)
      system coherency line size         = 0x40 (64)
      physical line partitions           = 0x1 (1)
      ways of associativity              = 0x8 (8)
      number of sets                     = 0x80 (128)
      WBINVD/INVD acts on lower caches   = false
      inclusive to lower caches          = false
      complex cache indexing             = false
      number of sets (s)                 = 128
      (size synth)                       = 65536 (64 KB)
      --- cache 2 ---
      cache type                         = unified cache (3)
      cache level                        = 0x2 (2)
      self-initializing cache level      = true
      fully associative cache            = false
      maximum IDs for CPUs sharing cache = 0x7 (7)
      maximum IDs for cores in pkg       = 0x3f (63)
      system coherency line size         = 0x40 (64)
      physical line partitions           = 0x1 (1)
      ways of associativity              = 0x10 (16)
      number of sets                     = 0x1000 (4096)
      WBINVD/INVD acts on lower caches   = false
      inclusive to lower caches          = false
      complex cache indexing             = false
      number of sets (s)                 = 4096
      (size synth)                       = 4194304 (4 MB)
      --- cache 3 ---
      cache type                         = unified cache (3)
      cache level                        = 0x3 (3)
      self-initializing cache level      = true
      fully associative cache            = false
      maximum IDs for CPUs sharing cache = 0x1ff (511)
      maximum IDs for cores in pkg       = 0x3f (63)
      system coherency line size         = 0x40 (64)
      physical line partitions           = 0x1 (1)
      ways of associativity              = 0xc (12)
      number of sets                     = 0x24000 (147456)
      WBINVD/INVD acts on lower caches   = false
      inclusive to lower caches          = false
      complex cache indexing             = true
      number of sets (s)                 = 147456
      (size synth)                       = 113246208 (108 MB)
      --- cache 4 ---
      cache type                         = no more caches (0)

Suggested-by: Tejus GK <tejus.gk@nutanix.com>
Suggested-by: Jason Zeng <jason.zeng@intel.com>
Suggested-by: "Daniel P . Berrangé" <berrange@redhat.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Tao Su <tao1.su@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250711104603.1634832-2-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-12 15:28:22 +02:00
Zhao Liu
2141898bb9 i386/cpu: Use a unified cache_info in X86CPUState
At present, all cases using the cache model (CPUID 0x2, 0x4, 0x80000005,
0x80000006 and 0x8000001D leaves) have been verified to be able to
select either cache_info_intel or cache_info_amd based on the vendor.

Therefore, further merge cache_info_intel and cache_info_amd into a
unified cache_info in X86CPUState, and during its initialization, set
different legacy cache models based on the vendor.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250711102143.1622339-19-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-12 15:28:22 +02:00
Zhao Liu
25acae4c6e i386/cpu: Select legacy cache model based on vendor in CPUID 0x8000001D
As preparation for merging cache_info_cpuid4 and cache_info_amd in
X86CPUState, set legacy cache model based on vendor in the CPUID
0x8000001D leaf. For AMD CPU, select legacy AMD cache model (in
cache_info_amd) as the default cache model like before, otherwise,
select legacy Intel cache model (in cache_info_cpuid4).

In fact, for Intel (and Zhaoxin) CPU, this change is safe because the
extended CPUID level supported by Intel is up to 0x80000008. So Intel
Guest doesn't have this 0x8000001D leaf.

Although someone could bump "xlevel" up to 0x8000001D for Intel Guest,
it's meaningless and this is undefined behavior. This leaf should be
considered reserved, but the SDM does not explicitly state this. So,
there's no need to specifically use vendor_cpuid_only_v2 to fix
anything, as it doesn't even qualify as a fix since nothing is
currently broken.

Therefore, it is acceptable to select the default legacy cache model
based on the vendor.

For the CPUID 0x8000001D leaf, in X86CPUState, a unified cache_info is
enough. It only needs to be initialized and configured with the
corresponding legacy cache model based on the vendor.

Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250711102143.1622339-18-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-12 15:28:22 +02:00
Zhao Liu
00fa96c96a i386/cpu: Select legacy cache model based on vendor in CPUID 0x80000006
As preparation for merging cache_info_cpuid4 and cache_info_amd in
X86CPUState, set legacy cache model based on vendor in the CPUID
0x80000006 leaf. For AMD CPU, select legacy AMD cache model (in
cache_info_amd) as the default cache model like before, otherwise,
select legacy Intel cache model (in cache_info_cpuid4).

To ensure compatibility is not broken, add an enable_legacy_vendor_cache
flag based on x-vendor-only-v2 to indicate cases where the legacy cache
model should be used regardless of the vendor. For CPUID 0x80000006 leaf,
enable_legacy_vendor_cache flag indicates to pick legacy Intel cache
model, which is for compatibility with the behavior of PC machine v10.0
and older.

The following explains how current vendor-based default legacy cache
model ensures correctness without breaking compatibility.

* For the PC machine v6.0 and older, vendor_cpuid_only=false, and
  vendor_cpuid_only_v2=false.

  - If the named CPU model has its own cache model, and doesn't use
    legacy cache model (legacy_cache=false), then cache_info_cpuid4 and
    cache_info_amd are same, so 0x80000006 leaf uses its own cache model
    regardless of the vendor.

  - For max/host/named CPU (without its own cache model), then the flag
    enable_legacy_vendor_cache is true, they will use legacy AMD cache
    model just like their previous behavior.

* For the PC machine v10.0 and older (to v6.1), vendor_cpuid_only=true,
  and vendor_cpuid_only_v2=false.

  - No change, since this leaf doesn't aware vendor_cpuid_only.

* For the PC machine v10.1 and newer, vendor_cpuid_only=true, and
  vendor_cpuid_only_v2=true.

  - If the named CPU model has its own cache model (legacy_cache=false),
    then cache_info_cpuid4 & cache_info_amd both equal to its own cache
    model, so it uses its own cache model in 0x80000006 leaf regardless
    of the vendor. Intel and Zhaoxin CPUs have their special encoding
    based on SDM, which is the expected behavior and no different from
    before.

  - For max/host/named CPU (without its own cache model), then the flag
    enable_legacy_vendor_cache is false, the legacy cache model is
    selected based on vendor.

    For AMD CPU, it will use legacy AMD cache as before.

    For non-AMD (Intel/Zhaoxin) CPU, it will use legacy Intel cache and
    be encoded based on SDM as expected.

    Here, selecting the legacy cache model based on the vendor does not
    change the previous (before the change) behavior.

Therefore, the above analysis proves that, with the help of the flag
enable_legacy_vendor_cache, it is acceptable to select the default
legacy cache model based on the vendor.

For the CPUID 0x80000006 leaf, in X86CPUState, a unified cache_info is
enough. It only needs to be initialized and configured with the
corresponding legacy cache model based on the vendor.

Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250711102143.1622339-17-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-12 15:28:22 +02:00