Match the contents of table 2-17 ("#UD Exception and VEX.L Field Encoding")
in the SDM, for instruction in exception class 5. They were incorrectly
accepting 256-bit versions that do not exist.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 2eb8d9734355ed86e162dce2a3f265ffee4005ed)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Fix typo to avoid the following build failure:
target/i386/nvmm/nvmm-all.c: In function 'nvmm_init_vcpu':
target/i386/nvmm/nvmm-all.c:988:9: error: 'AccelCPUState' has no member named 'vcpu_dirty'
988 | qcpu->vcpu_dirty = true;
| ^~
Cc: qemu-stable@nongnu.org
Reported-by: Thomas Huth <thuth@redhat.com>
Fixes: 2098164a6b ("accel/nvmm: Replace @dirty field by generic CPUState::vcpu_dirty field")
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Tested-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20260113203924.81560-1-philmd@linaro.org>
(cherry picked from commit 7be4256281f430f726366c92ffdea0b72651de8a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
VEX is only forbidden in real and vm86 mode; 16-bit protected mode supports
it for some unfathomable reason.
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit ed88bdcfbdcf9d411607cd690f93f915feff6a5b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
VSIB can have either 32-bit or 64-bit addresses, pass a constant mask to
the helper and apply it before the load.
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 5e3572ef2e94608568b1a73eab9d382b250936eb)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
If the vex_special field was not initialized, it was considered to be
X86_VEX_SSEUnaligned (whose value was zero). Add a new value to
fix that.
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 73dd6e4a36dd8d85548292f382a4d479e2810371)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
From the manual: "In 64-bit mode all 4 bits may be used. [...]
In 32-bit and 16-bit modes bit 6 must be 1 (if bit 6 is not 1, the
2-byte VEX version will generate LDS instruction and the 3-byte VEX
version will ignore this bit)."
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 0db1b556e4bcd7a51f222cda9e14850f88fe3f88)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
While the (intno << shift) expression is correct for indexing the IDT based on
whether Long Mode is active, the error code itself was unchanged with AMD64,
and is still the index with 3 bits of metadata in the bottom.
Found when running a Xen unit test, all under QEMU. The unit test objected to
being told there was an error with IDT index 256 when INT $0x80 (128) was the
problem instruction:
...
Error: Unexpected fault 0x800d0802, #GP[IDT[256]]
...
Fixes: d2fd1af767 ("x86_64 linux user emulation")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Link: https://lore.kernel.org/r/20250312000603.3666083-1-andrew.cooper3@citrix.com
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3160
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 60efba3c1bff0d78632d45c2dc927c5bc7a17ba8)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Recent changes introduced build errors in the i386 HVF backend:
- ../accel/hvf/hvf-accel-ops.c:163:17: error: no member named 'guest_debug_enabled' in 'struct AccelCPUState'
163 | cpu->accel->guest_debug_enabled = false;
- ../accel/hvf/hvf-accel-ops.c:151:51
error: no member named 'unblock_ipi_mask' in 'struct AccelCPUState'
- ../target/i386/hvf/hvf.c:736:5
error: use of undeclared identifier 'rip'
- ../target/i386/hvf/hvf.c:737:5
error: use of undeclared identifier 'env'
This patch corrects the field usage and move identifier to correct
function ensuring successful compilation of the i386 HVF backend.
These issues were caused by:
Fixes: 2ad756383e (“accel/hvf: Restrict ARM-specific fields of AccelCPUState”)
Fixes: 2a21c92447 (“target/i386/hvf: Factor hvf_handle_vmexit() out”)
Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20251126094601.56403-1-phind.uet@gmail.com>
[PMD: Keep setting vcpu_dirty on AArch64]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Message-Id: <20251128085854.53539-1-phind.uet@gmail.com>
The stack can be 32-bit even in real mode, and in this case
the stack pointer must be updated in its entirety rather than
just the bottom 16 bits. The same is true of real mode IRET,
for which there was even a comment suggesting the right thing
to do.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1506
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The exit_code parameter of cpu_vmexit is declared as uint32_t, but exit
codes are 64 bits wide according to the AMD SVM specification. And because
uint32_t is unsigned, this causes exit codes to be zero-extended, for example
writing SVM_EXIT_ERR as 0xffff_ffff instead of the expected 0xffff_ffff_ffff_ffff.
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2977
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Correctly reject invalid segment registers, including CS when used as
the destination of a MOV. Ignore the REX prefix as well.
Fixes: 5e9e21bcc4 ("target/i386: move 60-BF opcodes to new decoder", 2024-05-07)
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3195
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There are a small set of binary SSE insns which have no MMX
equivalent, which we create the gen functions for with the
BINARY_INT_SSE() macro. This forwards to gen_binary_int_sse() with a
NULL pointer for 'mmx'.
For almost all of these insns we correctly mark them in the decode
table as not permitting a zero prefix byte; however we got this wrong
for VPERMILPS, with the result that a bogus instruction would get
through the decode checks and end up in gen_binary_int_sse() trying
to call a NULL pointer.
Correct the decode table entry for VPERMILPS so that we get the
expected #UD exception.
In the x86 SDM, table A-4 "Three-byte Opcode Map: 08H-FFH
(First Two Bytes are 0F 38H)" confirms that there is no pfx 0
version of VPERMILPS.
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3199
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Link: https://lore.kernel.org/r/20251114175417.2794804-1-peter.maydell@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently, this function is only available in MSHV. If a different accelerator
is used, and the code jumps to this section, a segfault will occur.
(I ran into this with HVF)
Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Link: https://lore.kernel.org/r/20251114082915.71884-2-phind.uet@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In the decode_group9() function, if we don't recognise the insn as
one that we should handle, we leave the 'entry' pointer unaltered.
Because the X86OpEntry struct has a union for the gen and decode
pointers, this means that the top level code will call decode.e.gen()
which tries to use the decode function pointer (still set to
decode_group9) as a gen function pointer.
This is undefined behaviour, but seems to be mostly harmless in
practice (we call decode_group9() again with bogus arguments and it
does nothing). If you have CFI enabled then it will trip the CFI
check:
../target/i386/tcg/decode-new.c.inc:2862:9: runtime error: control flow integrity check for type 'void (struct DisasContext *, struct X86DecodedInsn *)' failed during indirect function call
Set *entry to UNKNOWN_OPCODE to provoke the #UD exception, as we do
in decode_group1A() and decode_group11() for similar situations.
Thanks to the bug reporter for the clear description and analysis of
the bug and the simple reproducer.
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3172
Fixes: fcd16539eb ("target/i386: convert CMPXCHG8B/CMPXCHG16B to new decoder")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20251021173152.1695997-1-peter.maydell@linaro.org>
Function migrate_add_blocker will free the reason and set it to NULL
if failure is returned.
Signed-off-by: Bin Guo <guobin@linux.alibaba.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Link: https://lore.kernel.org/r/20251024205532.19883-1-guobin@linux.alibaba.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Add IgvmNativeVpContextX64 struct holding the register state (see igvm
spec), and the qigvm_x86_load_context() function to load the register
state.
Wire up using two new functions: qigvm_x86_set_vp_context() is called
from igvm file handling code and stores the boot processor context.
qigvm_x86_bsp_reset() is called from i386 target cpu reset code and
loads the context into the cpu registers.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-ID: <20251029105555.2492276-5-kraxel@redhat.com>
Add and wire up qigvm_x86_get_mem_map_entry function which converts the
e820 table into an igvm memory map parameter. This makes igvm files for
the native (non-confidential) platform with memory map parameter work.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-ID: <20251029105555.2492276-4-kraxel@redhat.com>
Similarly to 1d78a3c3ab for KVM, wrap hv_vcpu_run() with
cpu_exec_start/end(), so that the accelerator can perform
pending operations while all vCPUs are quiescent. See also
explanation in commit c265e976f4 ("cpus-common: lock-free
fast path for cpu_exec_start/end").
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Factor hvf_handle_vmexit() out of hvf_arch_vcpu_exec().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
hvf_put_registers() and hvf_get_registers() are implemented per
target, rename them using the 'hvf_arch_' prefix following the
per target pattern.
Since they call hv_vcpu_set_reg() / hv_vcpu_get_reg(), mention
they must be called on the vCPU.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Mads Ynddal <mads@ynddal.dk>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
hvf_vcpu_exec() is implemented per target, rename it as
hvf_arch_vcpu_exec(), following the per target pattern.
Since it calls hv_vcpu_run(), mention it must be called
on the vCPU.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Mads Ynddal <mads@ynddal.dk>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We recently introduced API for registering callbacks for trap related
events as well as the corresponding hook functions. Due to differences
between architectures, the latter need to be called from target specific
code.
This change places the hook for x86 targets.
Signed-off-by: Julian Ganz <neither@nut.email>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20251027110344.2289945-16-alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Coverity complains because we assign to ret here but
then never read it again before we overwrite it with
the call to set_x64_registers().
Analyzed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The SMM cpu address space is initialized in a machine_init_done
notifier. It only runs once when QEMU starts up, which leads to the
issue that for any hotplugged CPU after the machine is ready, SMM
cpu address space doesn't get initialized.
Fix the issue by initializing the SMM cpu address space in x86_cpu_plug()
when the cpu is hotplugged.
Fixes: 591f817d81 ("target/i386: Define enum X86ASIdx for x86's address spaces")
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Closes: https://lore.kernel.org/qemu-devel/CAFEAcA_3kkZ+a5rTZGmK8W5K6J7qpYD31HkvjBnxWr-fGT2h_A@mail.gmail.com/
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20251014094216.164306-2-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Avoids the `current_cpu` global and seems more robust by not "forgetting" the
own APIC and then re-determining it by cpu_get_current_apic() which uses the
global.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20251019210303.104718-9-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Makes the APIC API more type-safe by resolving quite a few APIC_COMMON
downcasts.
Like PICCommonState, the APICCommonState is now a public typedef while staying
an abstract datatype.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20251019210303.104718-8-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Commit b7ecba0f6f ("docs/devel/loads-stores.rst: Document our
various load and store APIs") mentioned cpu_physical_memory_*()
methods are legacy, the replacement being address_space_*().
Replace:
- cpu_physical_memory_read(len=4) -> address_space_ldl()
- cpu_physical_memory_read(len=8) -> address_space_ldq()
inlining the little endianness conversion via the '_le' suffix.
As with the previous implementation, ignore whether the memory
read succeeded or failed.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Message-Id: <20251002145742.75624-3-philmd@linaro.org>
We want to replace the cpu_physical_memory_read() calls by
address_space_read() equivalents. Since the latter requires
an address space, and these commands are run in the context
of a vCPU, propagate its first address space. Next commit
will do the replacements.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Message-Id: <20251002145742.75624-2-philmd@linaro.org>
Join the 3 KVM_PUT_*_STATE definitions in a single enum.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Link: https://lore.kernel.org/r/20251008040715.81513-3-philmd@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In user-mode emulation, QEMU uses the default setting of the LDT base
and limit, which places it at the bottom 64K of virtual address space.
However, by default there is no LDT at all in Linux processes, and
therefore the limit should be 0.
This is visible as a NULL pointer dereference in LSL and LAR instructions
when they try to read the LDT at an unmapped address.
Resolves: #1376
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The T bit is bit 0 of the 16-bit word at offset 100 of the TSS. However,
accessing it with a 32-bit word is not really correct, because bytes
102-103 contain the I/O map base address (relative to the base of the
TSS) and bits 1-15 are reserved. In particular, any task switch to a TSS that
has a nonzero I/O map base address is broken.
This fixes the eventinj and taskswitch tests in kvm-unit-tests.
Cc: qemu-stable@nongnu.org
Fixes: ad441b8b79 ("target/i386: implement TSS trap bit", 2025-05-12)
Reported-by: Thomas Huth <thuth@redhat.com>
Closes: https://gitlab.com/qemu-project/qemu/-/issues/3101
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For x86_64 a 16 bit push op (pushw) of a memory address would generate
a 64 bit store on the stack instead of a 16 bit store.
For example:
pushw (%rax)
behaves like
pushq (%rax)
which is incorrect.
This patch fixes that.
Signed-off-by: Thomas Ogrisegg <tom-bugs-qemu@fnord.at>
Link: https://lore.kernel.org/r/20250715210307.GA1115@x1.fnord.at
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
do_smm_enter and helper_rsm sets the env->dr, but does not sync the
values with cpu_x86_update_dr7. A malicious kernel may control the
instruction pointer in SMM by setting a breakpoint on the SMI
entry point, and after do_smm_enter cpu->breakpoints contains the
stale breakpoint; and because IDT is not reloaded upon SMI entry,
the debug exception handler controlled by the malicious kernel
is invoked.
Fixes: 01df040b52 ("x86: Debug register emulation (Jan Kiszka)")
Reported-by: unvariant.winter@gmail.com
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Link: https://lore.kernel.org/r/2bacb9b24e9d337dbe48791aa25d349eb9c52c3a.1758794468.git.zhuyifei@google.com
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[commit message by YiFei Zhu]
A malicious kernel may control the instruction pointer in SMM in a
multi-processor VM by sending a sequence of IPIs via APIC:
CPU0 CPU1
IPI(CPU1, MODE_INIT)
x86_cpu_exec_reset()
apic_init_reset()
s->wait_for_sipi = true
IPI(CPU1, MODE_SMI)
do_smm_enter()
env->hflags |= HF_SMM_MASK;
IPI(CPU1, MODE_STARTUP, vector)
do_cpu_sipi()
apic_sipi()
/* s->wait_for_sipi check passes */
cpu_x86_load_seg_cache_sipi(vector)
A different sequence, SMI INIT SIPI, is also buggy in TCG because
INIT is not blocked or latched during SMM. However, it is not
vulnerable to an instruction pointer control in the same way because
x86_cpu_exec_reset clears env->hflags, exiting SMM.
Fixes: a9bad65d2c ("target-i386: wake up processors that receive an SMI")
Analyzed-by: YiFei Zhu <zhuyifei@google.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Newer Intel hardware (Sapphire Rapids and higher) sets multiple MDS
immunity bits in MSR_IA32_ARCH_CAPABILITIES but lacks the hardware-level
MSR_ARCH_CAP_FB_CLEAR (bit 17):
ARCH_CAP_MDS_NO
ARCH_CAP_TAA_NO
ARCH_CAP_PSDP_NO
ARCH_CAP_FBSDP_NO
ARCH_CAP_SBDR_SSDP_NO
This prevents VMs with fb-clear=on from migrating from older hardware
(Cascade Lake, Ice Lake) to newer hardware, limiting live migration
capabilities. Note fb-clear was first introduced in v8.1.0 [1].
Expose MSR_ARCH_CAP_FB_CLEAR for MDS-invulnerable systems to enable
seamless migration between hardware generations.
Note: There is no impact when a guest migrates to newer hardware as
the existing bit combinations already mark the host as MMIO-immune and
disable FB_CLEAR operations in the kernel (see Linux's
arch_cap_mmio_immune() and vmx_update_fb_clear_dis()). See kernel side
discussion for [2] for additional context.
[1] 22e1094ca8 ("target/i386: add support for FB_CLEAR feature")
[2] https://patchwork.kernel.org/project/kvm/patch/20250401044931.793203-1-jon@nutanix.com/
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jon Kohler <jon@nutanix.com>
Link: https://lore.kernel.org/r/20251008202557.4141285-1-jon@nutanix.com
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Transient Scheduler Attacks (TSA) are new speculative side channel attacks
related to the execution timing of instructions under specific
microarchitectural conditions. In some cases, an attacker may be able to
use this timing information to infer data from other contexts, resulting in
information leakage
CPUID Fn8000_0021 EAX[5] (VERW_CLEAR). If this bit is 1, the memory form of
the VERW instruction may be used to help mitigate TSA.
Link: https://www.amd.com/content/dam/amd/en/documents/resources/bulletin/technical-guidance-for-mitigating-transient-scheduler-attacks.pdf
Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/e6362672e3a67a9df661a8f46598335a1a2d2754.1752176771.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Transient Scheduler Attacks (TSA) are new speculative side channel attacks
related to the execution timing of instructions under specific
microarchitectural conditions. In some cases, an attacker may be able to
use this timing information to infer data from other contexts, resulting in
information leakage.
AMD has identified two sub-variants two variants of TSA.
CPUID Fn8000_0021 ECX[1] (TSA_SQ_NO).
If this bit is 1, the CPU is not vulnerable to TSA-SQ.
CPUID Fn8000_0021 ECX[2] (TSA_L1_NO).
If this bit is 1, the CPU is not vulnerable to TSA-L1.
Add the new feature word FEAT_8000_0021_ECX and corresponding bits to
detect TSA variants.
Link: https://www.amd.com/content/dam/amd/en/documents/resources/bulletin/technical-guidance-for-mitigating-transient-scheduler-attacks.pdf
Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/12881b2c03fa351316057ddc5f39c011074b4549.1752176771.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There are hvcalls that are invoked during MMIO exits, the payload is of
dynamic size. To avoid heap allocations we can use preallocated pages as
in/out buffer for those calls. A page is reserved per vCPU and used for
set/get register hv calls.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250916164847.77883-26-magnuskulke@linux.microsoft.com
[Use standard MAX_CONST macro; mshv.h/mshv_int.h split. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
QEMU maps certain regions into the guest multiple times, as seen in the
trace below. Currently the MSHV kernel driver will reject those
mappings. To workaround this, a record is kept (a static global list of
"slots", inspired by what the HVF accelerator has implemented). An
overlapping region is not registered at the hypervisor, and marked as
mapped=false. If there is an UNMAPPED_GPA exit, we can look for a slot
that is unmapped and would cover the GPA. In this case we map out the
conflicting slot and map in the requested region.
mshv_set_phys_mem add=1 name=pc.bios
mshv_map_memory => u_a=7ffff4e00000 gpa=00fffc0000 size=00040000
mshv_set_phys_mem add=1 name=ioapic
mshv_set_phys_mem add=1 name=hpet
mshv_set_phys_mem add=0 name=pc.ram
mshv_unmap_memory u_a=7fff67e00000 gpa=0000000000 size=80000000
mshv_set_phys_mem add=1 name=pc.ram
mshv_map_memory u_a=7fff67e00000 gpa=0000000000 size=000c0000
mshv_set_phys_mem add=1 name=pc.rom
mshv_map_memory u_a=7ffff4c00000 gpa=00000c0000 size=00020000
mshv_set_phys_mem add=1 name=pc.bios
mshv_remap_attempt => u_a=7ffff4e20000 gpa=00000e0000 size=00020000
The mapping table is guarded by a mutex for concurrent modification and
RCU mechanisms for concurrent reads. Writes occur rarely, but we'll have
to verify whether an unmapped region exist for each UNMAPPED_GPA exit,
which happens frequently.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250916164847.77883-24-magnuskulke@linux.microsoft.com
[Fix format strings for trace-events; mshv.h/mshv_int.h split. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add the main vCPU execution loop for MSHV using the MSHV_RUN_VP ioctl.
The execution loop handles guest entry and VM exits. There are handlers for
memory r/w, PIO and MMIO to which the exit events are dispatched.
In case of MMIO the i386 instruction decoder/emulator is invoked to
perform the operation in user space.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250916164847.77883-23-magnuskulke@linux.microsoft.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Connect the x86 instruction decoder and emulator to the MSHV backend
to handle intercepted instructions. This enables software emulation
of MMIO operations in MSHV guests. MSHV has a translate_gva hypercall
that is used to accessing the physical guest memory.
A guest might read from unmapped memory regions (e.g. OVMF will probe
0xfed40000 for a vTPM). In those cases 0xFF bytes is returned instead of
aborting the execution.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250916164847.77883-21-magnuskulke@linux.microsoft.com
[mshv.h/mshv_int.h split. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Convert the guest CPU's CPUID model into MSHV's format and register it
with the hypervisor. This ensures that the guest observes the correct
CPU feature set during CPUID instructions.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250916164847.77883-19-magnuskulke@linux.microsoft.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Write CPU register state to MSHV vCPUs. Various mapping functions to
prepare the payload for the HV call have been implemented.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250916164847.77883-17-magnuskulke@linux.microsoft.com
[mshv.h/mshv_int.h split. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Retrieve special registers (e.g. segment, control, and descriptor
table registers) from MSHV vCPUs.
Various helper functions to map register state representations between
Qemu and MSHV are introduced.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250916164847.77883-16-magnuskulke@linux.microsoft.com
[mshv.h/mshv_int.h split. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>