Commit graph

2602 commits

Author SHA1 Message Date
Paolo Bonzini
f48aaf926e target/i386/tcg: fix a few instructions that do not support VEX.L=1
Match the contents of table 2-17 ("#UD Exception and VEX.L Field Encoding")
in the SDM, for instruction in exception class 5.  They were incorrectly
accepting 256-bit versions that do not exist.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 2eb8d9734355ed86e162dce2a3f265ffee4005ed)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2026-01-28 12:01:22 +03:00
Philippe Mathieu-Daudé
4131a1d83c accel/nvmm: Fix 'cpu' typo in nvmm_init_vcpu()
Fix typo to avoid the following build failure:

  target/i386/nvmm/nvmm-all.c: In function 'nvmm_init_vcpu':
  target/i386/nvmm/nvmm-all.c:988:9: error: 'AccelCPUState' has no member named 'vcpu_dirty'
    988 |     qcpu->vcpu_dirty = true;
        |         ^~

Cc: qemu-stable@nongnu.org
Reported-by: Thomas Huth <thuth@redhat.com>
Fixes: 2098164a6b ("accel/nvmm: Replace @dirty field by generic CPUState::vcpu_dirty field")
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Tested-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20260113203924.81560-1-philmd@linaro.org>
(cherry picked from commit 7be4256281f430f726366c92ffdea0b72651de8a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2026-01-18 20:29:45 +03:00
Paolo Bonzini
11e286fb93 target/i386/tcg: allow VEX in 16-bit protected mode
VEX is only forbidden in real and vm86 mode; 16-bit protected mode supports
it for some unfathomable reason.

Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit ed88bdcfbdcf9d411607cd690f93f915feff6a5b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2026-01-16 14:29:07 +03:00
Paolo Bonzini
6594e50e7e target/i386/tcg: mask addresses for VSIB
VSIB can have either 32-bit or 64-bit addresses, pass a constant mask to
the helper and apply it before the load.

Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 5e3572ef2e94608568b1a73eab9d382b250936eb)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2026-01-16 14:28:56 +03:00
Paolo Bonzini
51bc24d427 target/i386/tcg: do not mark all SSE instructions as unaligned
If the vex_special field was not initialized, it was considered to be
X86_VEX_SSEUnaligned (whose value was zero).  Add a new value to
fix that.

Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 73dd6e4a36dd8d85548292f382a4d479e2810371)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2026-01-16 14:28:45 +03:00
Paolo Bonzini
59c9137156 target/i386/tcg: ignore V3 in 32-bit mode
From the manual: "In 64-bit mode all 4 bits may be used. [...]
In 32-bit and 16-bit modes bit 6 must be 1 (if bit 6 is not 1, the
2-byte VEX version will generate LDS instruction and the 3-byte VEX
version will ignore this bit)."

Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 0db1b556e4bcd7a51f222cda9e14850f88fe3f88)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-12-29 10:44:56 +03:00
Andrew Cooper
b33a563281 target/i386: Fix #GP error code for INT instructions
While the (intno << shift) expression is correct for indexing the IDT based on
whether Long Mode is active, the error code itself was unchanged with AMD64,
and is still the index with 3 bits of metadata in the bottom.

Found when running a Xen unit test, all under QEMU.  The unit test objected to
being told there was an error with IDT index 256 when INT $0x80 (128) was the
problem instruction:

  ...
  Error: Unexpected fault 0x800d0802, #GP[IDT[256]]
  ...

Fixes: d2fd1af767 ("x86_64 linux user emulation")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Link: https://lore.kernel.org/r/20250312000603.3666083-1-andrew.cooper3@citrix.com
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3160
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 60efba3c1bff0d78632d45c2dc927c5bc7a17ba8)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-12-29 10:44:47 +03:00
Nguyen Dinh Phi
3bee93b9ab accel/hvf: Fix i386 HVF compilation failures
Recent changes introduced build errors in the i386 HVF backend:

 - ../accel/hvf/hvf-accel-ops.c:163:17: error: no member named 'guest_debug_enabled' in 'struct AccelCPUState'
   163 |     cpu->accel->guest_debug_enabled = false;

 - ../accel/hvf/hvf-accel-ops.c:151:51
   error: no member named 'unblock_ipi_mask' in 'struct AccelCPUState'

 - ../target/i386/hvf/hvf.c:736:5
   error: use of undeclared identifier 'rip'

 - ../target/i386/hvf/hvf.c:737:5
   error: use of undeclared identifier 'env'

This patch corrects the field usage and move identifier to correct
function ensuring successful compilation of the i386 HVF backend.

These issues were caused by:

Fixes: 2ad756383e (“accel/hvf: Restrict ARM-specific fields of AccelCPUState”)
Fixes: 2a21c92447 (“target/i386/hvf: Factor hvf_handle_vmexit() out”)

Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20251126094601.56403-1-phind.uet@gmail.com>
[PMD: Keep setting vcpu_dirty on AArch64]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Message-Id: <20251128085854.53539-1-phind.uet@gmail.com>
2025-12-01 21:21:16 +01:00
Paolo Bonzini
106d766c9d target/i386: fix stack size when delivering real mode interrupts
The stack can be 32-bit even in real mode, and in this case
the stack pointer must be updated in its entirety rather than
just the bottom 16 bits.  The same is true of real mode IRET,
for which there was even a comment suggesting the right thing
to do.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1506
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-11-17 09:49:26 +01:00
Paolo Bonzini
9c3afb9d9b target/i386: svm: fix sign extension of exit code
The exit_code parameter of cpu_vmexit is declared as uint32_t, but exit
codes are 64 bits wide according to the AMD SVM specification.  And because
uint32_t is unsigned, this causes exit codes to be zero-extended, for example
writing SVM_EXIT_ERR as 0xffff_ffff instead of the expected 0xffff_ffff_ffff_ffff.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2977
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-11-17 09:49:26 +01:00
Paolo Bonzini
ebb46ba6a4 target/i386/tcg: validate segment registers
Correctly reject invalid segment registers, including CS when used as
the destination of a MOV.  Ignore the REX prefix as well.

Fixes: 5e9e21bcc4 ("target/i386: move 60-BF opcodes to new decoder", 2024-05-07)
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3195
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-11-17 09:49:26 +01:00
Peter Maydell
ebd9ea2947 target/i386: Mark VPERMILPS as not valid with prefix 0
There are a small set of binary SSE insns which have no MMX
equivalent, which we create the gen functions for with the
BINARY_INT_SSE() macro.  This forwards to gen_binary_int_sse() with a
NULL pointer for 'mmx'.

For almost all of these insns we correctly mark them in the decode
table as not permitting a zero prefix byte; however we got this wrong
for VPERMILPS, with the result that a bogus instruction would get
through the decode checks and end up in gen_binary_int_sse() trying
to call a NULL pointer.

Correct the decode table entry for VPERMILPS so that we get the
expected #UD exception.

In the x86 SDM, table A-4 "Three-byte Opcode Map: 08H-FFH
(First Two Bytes are 0F 38H)" confirms that there is no pfx 0
version of VPERMILPS.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3199
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Link: https://lore.kernel.org/r/20251114175417.2794804-1-peter.maydell@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-11-17 09:49:25 +01:00
Nguyen Dinh Phi
46b06eaeb4 target/i386: emulate: Make sure fetch_instruction exist before calling it
Currently, this function is only available in MSHV. If a different accelerator
is used, and the code jumps to this section, a segfault will occur.
(I ran into this with HVF)

Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Link: https://lore.kernel.org/r/20251114082915.71884-2-phind.uet@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-11-17 09:49:25 +01:00
Peter Maydell
4f503afc7e target/x86: Correctly handle invalid 0x0f 0xc7 0xxx insns
In the decode_group9() function, if we don't recognise the insn as
one that we should handle, we leave the 'entry' pointer unaltered.
Because the X86OpEntry struct has a union for the gen and decode
pointers, this means that the top level code will call decode.e.gen()
which tries to use the decode function pointer (still set to
decode_group9) as a gen function pointer.

This is undefined behaviour, but seems to be mostly harmless in
practice (we call decode_group9() again with bogus arguments and it
does nothing).  If you have CFI enabled then it will trip the CFI
check:

../target/i386/tcg/decode-new.c.inc:2862:9: runtime error: control flow integrity check for type 'void (struct DisasContext *, struct X86DecodedInsn *)' failed during indirect function call

Set *entry to UNKNOWN_OPCODE to provoke the #UD exception, as we do
in decode_group1A() and decode_group11() for similar situations.

Thanks to the bug reporter for the clear description and analysis of
the bug and the simple reproducer.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3172
Fixes: fcd16539eb ("target/i386: convert CMPXCHG8B/CMPXCHG16B to new decoder")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20251021173152.1695997-1-peter.maydell@linaro.org>
2025-11-10 12:02:45 +01:00
Bin Guo
74343a438c migration: Don't free the reason after calling migrate_add_blocker
Function migrate_add_blocker will free the reason and set it to NULL
if failure is returned.

Signed-off-by: Bin Guo <guobin@linux.alibaba.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Link: https://lore.kernel.org/r/20251024205532.19883-1-guobin@linux.alibaba.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2025-11-03 16:04:10 -05:00
Gerd Hoffmann
593fe98d74 igvm: add support for initial register state load in native mode
Add IgvmNativeVpContextX64 struct holding the register state (see igvm
spec), and the qigvm_x86_load_context() function to load the register
state.

Wire up using two new functions: qigvm_x86_set_vp_context() is called
from igvm file handling code and stores the boot processor context.
qigvm_x86_bsp_reset() is called from i386 target cpu reset code and
loads the context into the cpu registers.

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-ID: <20251029105555.2492276-5-kraxel@redhat.com>
2025-11-03 07:38:53 +01:00
Gerd Hoffmann
13abf2fcb7 igvm: add support for igvm memory map parameter in native mode
Add and wire up qigvm_x86_get_mem_map_entry function which converts the
e820 table into an igvm memory map parameter.  This makes igvm files for
the native (non-confidential) platform with memory map parameter work.

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-ID: <20251029105555.2492276-4-kraxel@redhat.com>
2025-11-03 07:38:53 +01:00
Philippe Mathieu-Daudé
5f34a5b642 accel/hvf: Guard hv_vcpu_run() between cpu_exec_start/end() calls
Similarly to 1d78a3c3ab for KVM, wrap hv_vcpu_run() with
cpu_exec_start/end(), so that the accelerator can perform
pending operations while all vCPUs are quiescent. See also
explanation in commit c265e976f4 ("cpus-common: lock-free
fast path for cpu_exec_start/end").

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-10-31 16:26:46 +00:00
Philippe Mathieu-Daudé
2a21c92447 target/i386/hvf: Factor hvf_handle_vmexit() out
Factor hvf_handle_vmexit() out of hvf_arch_vcpu_exec().

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-10-31 16:26:45 +00:00
Philippe Mathieu-Daudé
1182ede151 accel/hvf: Rename hvf_put|get_registers -> hvf_arch_put|get_registers
hvf_put_registers() and hvf_get_registers() are implemented per
target, rename them using the 'hvf_arch_' prefix following the
per target pattern.

Since they call hv_vcpu_set_reg() / hv_vcpu_get_reg(), mention
they must be called on the vCPU.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Mads Ynddal <mads@ynddal.dk>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-10-31 16:26:45 +00:00
Philippe Mathieu-Daudé
963f1576c0 accel/hvf: Rename hvf_vcpu_exec() -> hvf_arch_vcpu_exec()
hvf_vcpu_exec() is implemented per target, rename it as
hvf_arch_vcpu_exec(), following the per target pattern.

Since it calls hv_vcpu_run(), mention it must be called
on the vCPU.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Mads Ynddal <mads@ynddal.dk>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-10-31 16:26:45 +00:00
Julian Ganz
b5c8cd6144 target/i386: call plugin trap callbacks
We recently introduced API for registering callbacks for trap related
events as well as the corresponding hook functions. Due to differences
between architectures, the latter need to be called from target specific
code.

This change places the hook for x86 targets.

Signed-off-by: Julian Ganz <neither@nut.email>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20251027110344.2289945-16-alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
2025-10-29 14:12:43 +00:00
Paolo Bonzini
d5e1d2dea1 target/i386: clear CPU_INTERRUPT_SIPI for all accelerators
Similar to what commit df32e5c5 did for TCG; fixes boot with multiple
processors on WHPX and probably more accelerators

Fixes: df32e5c568 ("i386/cpu: Prevent delivering SIPI during SMM in TCG mode", 2025-10-14)
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3178
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-28 14:50:40 +01:00
Paolo Bonzini
1557adc826 accel/mshv: use return value of handle_pio_str_read
Coverity complains because we assign to ret here but
then never read it again before we overwrite it with
the call to set_x64_registers().

Analyzed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-28 14:50:07 +01:00
Xiaoyao Li
639a294227 i386/kvm/cpu: Init SMM cpu address space for hotplugged CPUs
The SMM cpu address space is initialized in a machine_init_done
notifier. It only runs once when QEMU starts up, which leads to the
issue that for any hotplugged CPU after the machine is ready, SMM
cpu address space doesn't get initialized.

Fix the issue by initializing the SMM cpu address space in x86_cpu_plug()
when the cpu is hotplugged.

Fixes: 591f817d81 ("target/i386: Define enum X86ASIdx for x86's address spaces")
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Closes: https://lore.kernel.org/qemu-devel/CAFEAcA_3kkZ+a5rTZGmK8W5K6J7qpYD31HkvjBnxWr-fGT2h_A@mail.gmail.com/
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20251014094216.164306-2-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-28 12:39:59 +01:00
Bernhard Beschow
337eece9c0 hw/i386/apic: Ensure own APIC use in apic_msr_{read,write}
Avoids the `current_cpu` global and seems more robust by not "forgetting" the
own APIC and then re-determining it by cpu_get_current_apic() which uses the
global.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20251019210303.104718-9-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-10-21 20:16:47 +02:00
Bernhard Beschow
2fd15a24ca hw/i386/apic: Prefer APICCommonState over DeviceState
Makes the APIC API more type-safe by resolving quite a few APIC_COMMON
downcasts.

Like PICCommonState, the APICCommonState is now a public typedef while staying
an abstract datatype.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20251019210303.104718-8-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-10-21 20:16:47 +02:00
Philippe Mathieu-Daudé
17db4d61d1 target/i386/monitor: Replace legacy cpu_physical_memory_read() calls
Commit b7ecba0f6f ("docs/devel/loads-stores.rst: Document our
various load and store APIs") mentioned cpu_physical_memory_*()
methods are legacy, the replacement being address_space_*().

Replace:

 - cpu_physical_memory_read(len=4) -> address_space_ldl()
 - cpu_physical_memory_read(len=8) -> address_space_ldq()

inlining the little endianness conversion via the '_le' suffix.
As with the previous implementation, ignore whether the memory
read succeeded or failed.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Message-Id: <20251002145742.75624-3-philmd@linaro.org>
2025-10-16 17:07:13 +02:00
Philippe Mathieu-Daudé
152820a991 target/i386/monitor: Propagate CPU address space to 'info mem' handlers
We want to replace the cpu_physical_memory_read() calls by
address_space_read() equivalents. Since the latter requires
an address space, and these commands are run in the context
of a vCPU, propagate its first address space. Next commit
will do the replacements.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Message-Id: <20251002145742.75624-2-philmd@linaro.org>
2025-10-16 17:06:57 +02:00
Philippe Mathieu-Daudé
665a8035b7 accel/kvm: Introduce KvmPutState enum
Join the 3 KVM_PUT_*_STATE definitions in a single enum.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Link: https://lore.kernel.org/r/20251008040715.81513-3-philmd@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-14 11:03:59 +02:00
Paolo Bonzini
58aa1d08bb target/i386: user: do not set up a valid LDT on reset
In user-mode emulation, QEMU uses the default setting of the LDT base
and limit, which places it at the bottom 64K of virtual address space.
However, by default there is no LDT at all in Linux processes, and
therefore the limit should be 0.

This is visible as a NULL pointer dereference in LSL and LAR instructions
when they try to read the LDT at an unmapped address.

Resolves: #1376
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-14 11:03:59 +02:00
Paolo Bonzini
0d22b621b7 target/i386: fix access to the T bit of the TSS
The T bit is bit 0 of the 16-bit word at offset 100 of the TSS.  However,
accessing it with a 32-bit word is not really correct, because bytes
102-103 contain the I/O map base address (relative to the base of the
TSS) and bits 1-15 are reserved.  In particular, any task switch to a TSS that
has a nonzero I/O map base address is broken.

This fixes the eventinj and taskswitch tests in kvm-unit-tests.

Cc: qemu-stable@nongnu.org
Fixes: ad441b8b79 ("target/i386: implement TSS trap bit", 2025-05-12)
Reported-by: Thomas Huth <thuth@redhat.com>
Closes: https://gitlab.com/qemu-project/qemu/-/issues/3101
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-14 11:03:59 +02:00
Thomas Ogrisegg
5a2faa0a0a target/i386: fix x86_64 pushw op
For x86_64 a 16 bit push op (pushw) of a memory address would generate
a 64 bit store on the stack instead of a 16 bit store.

For example:
        pushw (%rax)

behaves like
        pushq (%rax)

which is incorrect.

This patch fixes that.

Signed-off-by: Thomas Ogrisegg <tom-bugs-qemu@fnord.at>
Link: https://lore.kernel.org/r/20250715210307.GA1115@x1.fnord.at
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-14 11:03:59 +02:00
YiFei Zhu
cdba90ac1b i386/tcg/smm_helper: Properly apply DR values on SMM entry / exit
do_smm_enter and helper_rsm sets the env->dr, but does not sync the
values with cpu_x86_update_dr7. A malicious kernel may control the
instruction pointer in SMM by setting a breakpoint on the SMI
entry point, and after do_smm_enter cpu->breakpoints contains the
stale breakpoint; and because IDT is not reloaded upon SMI entry,
the debug exception handler controlled by the malicious kernel
is invoked.

Fixes: 01df040b52 ("x86: Debug register emulation (Jan Kiszka)")
Reported-by: unvariant.winter@gmail.com
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Link: https://lore.kernel.org/r/2bacb9b24e9d337dbe48791aa25d349eb9c52c3a.1758794468.git.zhuyifei@google.com
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-14 11:03:59 +02:00
Paolo Bonzini
df32e5c568 i386/cpu: Prevent delivering SIPI during SMM in TCG mode
[commit message by YiFei Zhu]

A malicious kernel may control the instruction pointer in SMM in a
multi-processor VM by sending a sequence of IPIs via APIC:

CPU0			CPU1
IPI(CPU1, MODE_INIT)
			x86_cpu_exec_reset()
			apic_init_reset()
			s->wait_for_sipi = true
IPI(CPU1, MODE_SMI)
			do_smm_enter()
			env->hflags |= HF_SMM_MASK;
IPI(CPU1, MODE_STARTUP, vector)
			do_cpu_sipi()
			apic_sipi()
			/* s->wait_for_sipi check passes */
			cpu_x86_load_seg_cache_sipi(vector)

A different sequence, SMI INIT SIPI, is also buggy in TCG because
INIT is not blocked or latched during SMM. However, it is not
vulnerable to an instruction pointer control in the same way because
x86_cpu_exec_reset clears env->hflags, exiting SMM.

Fixes: a9bad65d2c ("target-i386: wake up processors that receive an SMI")
Analyzed-by: YiFei Zhu <zhuyifei@google.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-14 11:03:58 +02:00
Jon Kohler
00001a22d1 i386/kvm: Expose ARCH_CAP_FB_CLEAR when invulnerable to MDS
Newer Intel hardware (Sapphire Rapids and higher) sets multiple MDS
immunity bits in MSR_IA32_ARCH_CAPABILITIES but lacks the hardware-level
MSR_ARCH_CAP_FB_CLEAR (bit 17):
    ARCH_CAP_MDS_NO
    ARCH_CAP_TAA_NO
    ARCH_CAP_PSDP_NO
    ARCH_CAP_FBSDP_NO
    ARCH_CAP_SBDR_SSDP_NO

This prevents VMs with fb-clear=on from migrating from older hardware
(Cascade Lake, Ice Lake) to newer hardware, limiting live migration
capabilities. Note fb-clear was first introduced in v8.1.0 [1].

Expose MSR_ARCH_CAP_FB_CLEAR for MDS-invulnerable systems to enable
seamless migration between hardware generations.

Note: There is no impact when a guest migrates to newer hardware as
the existing bit combinations already mark the host as MMIO-immune and
disable FB_CLEAR operations in the kernel (see Linux's
arch_cap_mmio_immune() and vmx_update_fb_clear_dis()). See kernel side
discussion for [2] for additional context.

[1] 22e1094ca8 ("target/i386: add support for FB_CLEAR feature")
[2] https://patchwork.kernel.org/project/kvm/patch/20250401044931.793203-1-jon@nutanix.com/

Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jon Kohler <jon@nutanix.com>
Link: https://lore.kernel.org/r/20251008202557.4141285-1-jon@nutanix.com
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-14 11:03:58 +02:00
Mathias Krause
df9a3372dd target/i386: Fix CR2 handling for non-canonical addresses
Commit 3563362ddf ("target/i386: Introduce structures for mmu_translate")
accidentally modified CR2 for non-canonical address exceptions while these
should lead to a #GP / #SS instead -- without changing CR2.

Fix that.

A KUT test for this was submitted as [1].

[1] https://lore.kernel.org/kvm/20250612141637.131314-1-minipli@grsecurity.net/

Fixes: 3563362ddf ("target/i386: Introduce structures for mmu_translate")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Link: https://lore.kernel.org/r/20250612142155.132175-1-minipli@grsecurity.net
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-14 11:03:58 +02:00
Babu Moger
d8ec0baf4a target/i386: Add TSA feature flag verw-clear
Transient Scheduler Attacks (TSA) are new speculative side channel attacks
related to the execution timing of instructions under specific
microarchitectural conditions. In some cases, an attacker may be able to
use this timing information to infer data from other contexts, resulting in
information leakage

CPUID Fn8000_0021 EAX[5] (VERW_CLEAR). If this bit is 1, the memory form of
the VERW instruction may be used to help mitigate TSA.

Link: https://www.amd.com/content/dam/amd/en/documents/resources/bulletin/technical-guidance-for-mitigating-transient-scheduler-attacks.pdf
Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/e6362672e3a67a9df661a8f46598335a1a2d2754.1752176771.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-14 11:03:58 +02:00
Babu Moger
c79a35acad target/i386: Add TSA attack variants TSA-SQ and TSA-L1
Transient Scheduler Attacks (TSA) are new speculative side channel attacks
related to the execution timing of instructions under specific
microarchitectural conditions. In some cases, an attacker may be able to
use this timing information to infer data from other contexts, resulting in
information leakage.

AMD has identified two sub-variants two variants of TSA.
CPUID Fn8000_0021 ECX[1] (TSA_SQ_NO).
	If this bit is 1, the CPU is not vulnerable to TSA-SQ.

CPUID Fn8000_0021 ECX[2] (TSA_L1_NO).
	If this bit is 1, the CPU is not vulnerable to TSA-L1.

Add the new feature word FEAT_8000_0021_ECX and corresponding bits to
detect TSA variants.

Link: https://www.amd.com/content/dam/amd/en/documents/resources/bulletin/technical-guidance-for-mitigating-transient-scheduler-attacks.pdf
Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/12881b2c03fa351316057ddc5f39c011074b4549.1752176771.git.babu.moger@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-14 11:03:58 +02:00
Richard Henderson
1188b07e60 * i386: fix migration issues in 10.1
* target/i386/mshv: new accelerator
 * rust: use glib-sys-rs
 * rust: fixes for docker tests
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmjnaOwUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNsFQf/WXKxZLLnItHwDz3UdwjzewPWpz5N
 fpS0E4C03J8pACDgyfl7PQl47P7NlJ08Ig2Lc5l3Z9KiAKgh0orR7Cqd0BY5f9lo
 uk4FgXfXpQyApywAlctadrTfcH8sRv2tMaP6EJ9coLtJtHW9RUGFPaZeMsqrjpAl
 TpwAXPYNDDvvy1ih1LPh5DzOPDXE4pin2tDa94gJei56gY95auK4zppoNYLdB3kR
 GOyR4QK43/yhuxPHOmQCZOE3HK2XrKgMZHWIjAovjZjZFiJs49FaHBOpRfFpsUlG
 PB3UbIMtu69VY20LqbbyInPnyATRQzqIGnDGTErP6lfCGTKTy2ulQYWvHA==
 =KM5O
 -----END PGP SIGNATURE-----

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging

* i386: fix migration issues in 10.1
* target/i386/mshv: new accelerator
* rust: use glib-sys-rs
* rust: fixes for docker tests

# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmjnaOwUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroNsFQf/WXKxZLLnItHwDz3UdwjzewPWpz5N
# fpS0E4C03J8pACDgyfl7PQl47P7NlJ08Ig2Lc5l3Z9KiAKgh0orR7Cqd0BY5f9lo
# uk4FgXfXpQyApywAlctadrTfcH8sRv2tMaP6EJ9coLtJtHW9RUGFPaZeMsqrjpAl
# TpwAXPYNDDvvy1ih1LPh5DzOPDXE4pin2tDa94gJei56gY95auK4zppoNYLdB3kR
# GOyR4QK43/yhuxPHOmQCZOE3HK2XrKgMZHWIjAovjZjZFiJs49FaHBOpRfFpsUlG
# PB3UbIMtu69VY20LqbbyInPnyATRQzqIGnDGTErP6lfCGTKTy2ulQYWvHA==
# =KM5O
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 09 Oct 2025 12:49:00 AM PDT
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [unknown]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (35 commits)
  rust: fix path to rust_root_crate.sh
  tests/docker: make --enable-rust overridable with EXTRA_CONFIGURE_OPTS
  MAINTAINERS: Add maintainers for mshv accelerator
  docs: Add mshv to documentation
  target/i386/mshv: Use preallocated page for hvcall
  qapi/accel: Allow to query mshv capabilities
  accel/mshv: Handle overlapping mem mappings
  target/i386/mshv: Implement mshv_vcpu_run()
  target/i386/mshv: Write MSRs to the hypervisor
  target/i386/mshv: Integrate x86 instruction decoder/emulator
  target/i386/mshv: Register MSRs with MSHV
  target/i386/mshv: Register CPUID entries with MSHV
  target/i386/mshv: Set local interrupt controller state
  target/i386/mshv: Implement mshv_arch_put_registers()
  target/i386/mshv: Implement mshv_get_special_regs()
  target/i386/mshv: Implement mshv_get_standard_regs()
  target/i386/mshv: Implement mshv_store_regs()
  target/i386/mshv: Add CPU create and remove logic
  accel/mshv: Add vCPU signal handling
  accel/mshv: Add vCPU creation and execution loop
  ...

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2025-10-09 07:59:01 -07:00
Magnus Kulke
e4a20afce5 target/i386/mshv: Use preallocated page for hvcall
There are hvcalls that are invoked during MMIO exits, the payload is of
dynamic size. To avoid heap allocations we can use preallocated pages as
in/out buffer for those calls. A page is reserved per vCPU and used for
set/get register hv calls.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250916164847.77883-26-magnuskulke@linux.microsoft.com
[Use standard MAX_CONST macro; mshv.h/mshv_int.h split. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-08 19:17:31 +02:00
Magnus Kulke
efc4093358 accel/mshv: Handle overlapping mem mappings
QEMU maps certain regions into the guest multiple times, as seen in the
trace below. Currently the MSHV kernel driver will reject those
mappings. To workaround this, a record is kept (a static global list of
"slots", inspired by what the HVF accelerator has implemented). An
overlapping region is not registered at the hypervisor, and marked as
mapped=false. If there is an UNMAPPED_GPA exit, we can look for a slot
that is unmapped and would cover the GPA. In this case we map out the
conflicting slot and map in the requested region.

mshv_set_phys_mem       add=1 name=pc.bios
mshv_map_memory      => u_a=7ffff4e00000 gpa=00fffc0000 size=00040000
mshv_set_phys_mem       add=1 name=ioapic
mshv_set_phys_mem       add=1 name=hpet
mshv_set_phys_mem       add=0 name=pc.ram
mshv_unmap_memory       u_a=7fff67e00000 gpa=0000000000 size=80000000
mshv_set_phys_mem       add=1 name=pc.ram
mshv_map_memory         u_a=7fff67e00000 gpa=0000000000 size=000c0000
mshv_set_phys_mem       add=1 name=pc.rom
mshv_map_memory         u_a=7ffff4c00000 gpa=00000c0000 size=00020000
mshv_set_phys_mem       add=1 name=pc.bios
mshv_remap_attempt   => u_a=7ffff4e20000 gpa=00000e0000 size=00020000

The mapping table is guarded by a mutex for concurrent modification and
RCU mechanisms for concurrent reads. Writes occur rarely, but we'll have
to verify whether an unmapped region exist for each UNMAPPED_GPA exit,
which happens frequently.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250916164847.77883-24-magnuskulke@linux.microsoft.com
[Fix format strings for trace-events; mshv.h/mshv_int.h split. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-08 19:17:31 +02:00
Magnus Kulke
6dec60528c target/i386/mshv: Implement mshv_vcpu_run()
Add the main vCPU execution loop for MSHV using the MSHV_RUN_VP ioctl.

The execution loop handles guest entry and VM exits. There are handlers for
memory r/w, PIO and MMIO to which the exit events are dispatched.

In case of MMIO the i386 instruction decoder/emulator is invoked to
perform the operation in user space.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250916164847.77883-23-magnuskulke@linux.microsoft.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-08 19:17:31 +02:00
Magnus Kulke
64118f452c target/i386/mshv: Write MSRs to the hypervisor
Push current model-specific register (MSR) values to MSHV's vCPUs as
part of setting state to the hypervisor.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250916164847.77883-22-magnuskulke@linux.microsoft.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-08 19:17:31 +02:00
Magnus Kulke
9bc6a1d296 target/i386/mshv: Integrate x86 instruction decoder/emulator
Connect the x86 instruction decoder and emulator to the MSHV backend
to handle intercepted instructions. This enables software emulation
of MMIO operations in MSHV guests. MSHV has a translate_gva hypercall
that is used to accessing the physical guest memory.

A guest might read from unmapped memory regions (e.g. OVMF will probe
0xfed40000 for a vTPM). In those cases 0xFF bytes is returned instead of
aborting the execution.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250916164847.77883-21-magnuskulke@linux.microsoft.com
[mshv.h/mshv_int.h split. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-08 19:17:31 +02:00
Magnus Kulke
f38e2a63e5 target/i386/mshv: Register MSRs with MSHV
Build and register the guest vCPU's model-specific registers using
the MSHV interface.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250916164847.77883-20-magnuskulke@linux.microsoft.com
[mshv.h/mshv_int.h split. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-08 19:17:31 +02:00
Magnus Kulke
4fa04dd162 target/i386/mshv: Register CPUID entries with MSHV
Convert the guest CPU's CPUID model into MSHV's format and register it
with the hypervisor. This ensures that the guest observes the correct
CPU feature set during CPUID instructions.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250916164847.77883-19-magnuskulke@linux.microsoft.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-08 19:17:31 +02:00
Magnus Kulke
ca20d46fa9 target/i386/mshv: Set local interrupt controller state
To set the local interrupt controller state, perform hv calls retrieving
partition state from the hypervisor.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250916164847.77883-18-magnuskulke@linux.microsoft.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-08 19:17:30 +02:00
Magnus Kulke
25a1d871e0 target/i386/mshv: Implement mshv_arch_put_registers()
Write CPU register state to MSHV vCPUs. Various mapping functions to
prepare the payload for the HV call have been implemented.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250916164847.77883-17-magnuskulke@linux.microsoft.com
[mshv.h/mshv_int.h split. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-08 19:17:30 +02:00
Magnus Kulke
0382c2c854 target/i386/mshv: Implement mshv_get_special_regs()
Retrieve special registers (e.g. segment, control, and descriptor
table registers) from MSHV vCPUs.

Various helper functions to map register state representations between
Qemu and MSHV are introduced.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250916164847.77883-16-magnuskulke@linux.microsoft.com
[mshv.h/mshv_int.h split. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-08 19:17:30 +02:00