TCG_REG_TMP0 may be used by set_vtype* to load the vtype
parameter, so delay any other use of TCG_REG_TMP0 until
the correct vtype has been installed.
Cc: qemu-stable@nongnu.org
Fixes: d4be6ee111 ("tcg/riscv: Implement vector mov/dup{m/i}")
Reported-by: Zhijin Zeng <zengzhijin@linux.spacemit.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit af6db3b71310ea63a018d517ba7d79e4e014db62)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
We inadvertently disabled affected bits optimizations on operations
that use fold_masks_zosa. These happen relatively often in x86 code
for extract/sextract; for example given the following:
mov %esi, %ebp
xor $0x1, %ebp
the optimizer is able to simplify the "extract_i64 rbp,tmp0,$0x0,$0x20"
produced by the second instruction to a move.
Cc: qemu-stable@nongnu.org
Fixes: 932522a9dd ("tcg/optimize: Fold and to extract during optimize")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20251223163720.985578-1-pbonzini@redhat.com>
(cherry picked from commit 23b53ec3a8a279cb5acd5e022b464a4272fe9f8c)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
In computing a_mask, for or, we remove the bits from t1->o_mask
which are known to be zero. For orc, the bits known to be zero
are the inverse of those known to be one.
Cc: qemu-stable@nongnu.org
Fixes: cc4033ee47 ("tcg/optimize: Build and use zero, one and affected bits in fold_orc")
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 08b12bfb8f532dbc62e35c31d081ede1aa12098b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
When adding o_mask to this function, we used it in a
couple of places but failed to save it for future use.
Also, update a related comment.
Cc: qemu-stable@nongnu.org
Fixes: 56f15f67ea ("tcg/optimize: Add one's mask to TempOptInfo")
Reported-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 7d2d577de0c72f3cf2eb43f1534e908070d3bc47)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
All callers have already tested tcg_ctx->plugin_insn.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Since d182123974, the number of bits in a MemOpIdx tops out at 17.
which won't fit in the TCI rrm format, thus an assertion failure.
Introduce new opcodes that take the MemOpIdx from a register, as
we already do for qemu_ld2 and qemu_st2.
Fixes: d182123974 ("include/exec/memopidx: Adjust for 32 mmu indexes")
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
For native code generation, zero-extending 32-bit addresses for
the slow path helpers happens in tcg_out_{ld,st}_helper_args,
but there isn't really a slow path for TCI, so that didn't happen.
Make the extension for TCI explicit in the opcode stream,
much like we already do for plugins and atomic helpers.
Cc: qemu-stable@nongnu.org
Fixes: 24e46e6c9d ("accel/tcg: Widen tcg-ldst.h addresses to uint64_t")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The else after the TCG_TARGET_HAS_extract2 test is exactly
the same as what tcg_gen_extract2_i32 would emit itself.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
For Arm, we need 3 cases: (1) the alignment required when accessing
Normal memory, (2) the alignment required when accessing Device memory,
and (3) the atomicity of the access.
When we added TLB_CHECK_ALIGNED, we assumed that cases 2 and 3 were
identical, and thus used memop_atomicity_bits for TLB_CHECK_ALIGNED.
This is incorrect for multiple reasons, including that the atomicity
of the access is adjusted depending on whether or not we are executing
within a serial context.
For Arm, what is true is that there is an underlying alignment
requirement of the access, and for that access Normal memory
will support unalignement.
Introduce MO_ALIGN_TLB_ONLY to indicate that the alignment
specified in MO_AMASK only applies when the TLB entry has
TLB_CHECK_ALIGNED set; otherwise no alignment required.
Introduce memop_tlb_alignment_bits with an additional bool
argument that specifies whether TLB_CHECK_ALIGNED is set.
All other usage of memop_alignment_bits assumes it is not.
Remove memop_atomicity_bits as unused; it didn't properly
support MO_ATOM_SUBWORD anyway.
Update target/arm finalize_memop_atom to set MO_ALIGN_TLB_ONLY
when strict alignment isn't otherwise required.
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3171
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
32-bit host support is deprecated since commit 6d701c9bac
("meson: Deprecate 32-bit host support"), released as v10.0.
The next release being v10.2, we can remove the TCG backend
for 32-bit PPC hosts.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20251014173900.87497-2-philmd@linaro.org>
Missed some lines when converting to TCGOutOpQemuLdSt*.
Fixes: 86fe5c2597 ("tcg: Convert qemu_st{2} to TCGOutOpLdSt{2}")
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These aliases existed to simplify code for O32 and N32.
Now that the 64-bit abi is the only one supported, we
can use the DADD* instructions directly.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
32-bit host support is deprecated since commit 6d701c9bac
("meson: Deprecate 32-bit host support"), released as v10.0.
The next release being v10.2, we can remove the TCG backend
for 32-bit MIPS hosts.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20251009195210.33161-6-philmd@linaro.org>
See previous commit for rationale.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20251009195210.33161-5-philmd@linaro.org>
tcg_region_init() calls one of qemu_mprotect_rwx(),
qemu_mprotect_rw(), and mprotect(), then reports failure with
error_setg_errno(&error_fatal, errno, ...).
The use of &error_fatal is undesirable. qapi/error.h advises:
* Please don't error_setg(&error_fatal, ...), use error_report() and
* exit(), because that's more obvious.
The use of errno is wrong. qemu_mprotect_rwx() and qemu_mprotect_rw()
wrap around qemu_mprotect__osdep(). qemu_mprotect__osdep() calls
mprotect() on POSIX, VirtualProtect() on Windows, and reports failure
with error_report(). VirtualProtect() doesn't set errno. mprotect()
does, but error_report() may clobber it.
Fix tcg_region_init() to report errors only when it calls mprotect(),
and rely on qemu_mprotect_rwx()'s and qemu_mprotect_rw()'s error
reporting otherwise. Use error_report(), not error_setg().
Fixes: 22c6a9938f (tcg: Merge buffer protection and guard page protection)
Fixes: 6bc144237a (tcg: Use Error with alloc_code_gen_buffer)
Cc: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250923091000.3180122-3-armbru@redhat.com>
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
This array is within CPUNegativeOffsetState, which means the
last element of the array has an offset from env with the
smallest magnitude. This can be encoded into fewer bits
when generating TCG fast path memory references.
When we changed the NB_MMU_MODES to be a global constant,
rather than a per-target value, we pessimized the code
generated for targets which use only a few mmu indexes.
By inverting the array index, we counteract that.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Encapsulate access to cpu->neg.tlb.f[] in a function.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
A constant matrix can describe the movement of the 8 bits,
so these shifts can be performed with one instruction.
Logic courtesy of Andi Kleen <ak@linux.intel.com>:
https://gcc.gnu.org/pipermail/gcc-patches/2025-August/691624.html
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Add a backend-specific opcode for expanding the
GFNI vgf2p8affineqb instruction, which we can use
for expanding 8-bit immediate shifts and rotates.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The optimizer prefers to have constants as the second operand,
so expand LT x,0 instead of GT 0,x. This will not affect the
generated code at all.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Expand arithmetic right shift of bits-1 as a comparison vs 0.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
When converting from tcg_out_deposit, the arguments were not
shuffled properly.
Cc: qemu-stable@nongnu.org
Fixes: cf4905c031 ("tcg: Convert deposit to TCGOutOpDeposit")
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20250815122653.701782-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
There is no such thing as vector extract.
Fixes: 932522a9dd ("tcg/optimize: Fold and to extract during optimize")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3036
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Tested-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Avoid ubsan failure with clang-20,
tcg.h:715:19: runtime error: applying non-zero offset 64 to null pointer
by not using pointers.
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Fix the direction of the shift, introduced when converting
the codebase to TCGOutOp* and small tgen_* helpers.
Fixes: 5a4d034f3c ("tcg: Convert extract to TCGOutOpExtract")
Reported-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Andrea Bolognani <abologna@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Both cases are handled by fold_xor after conversion.
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
If operand 2 is constant, then the computation of z_mask and a_mask
will produce the same results as the explicit check via fold_xi_to_i.
Shift the calls of fold_xx_to_i and fold_ix_to_not down below the
i2->is_const check.
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
If operand 2 is constant, then the computation of z_mask and a_mask
will produce the same results as the explicit check via fold_xi_to_i.
Shift the calls of fold_xx_to_i and fold_ix_to_not down below the
i2->is_const check.
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
If operand 2 is constant, then the computation of z_mask
and a_mask will produce the same results as the explicit
checks via fold_xi_to_i and fold_xi_to_x. Shift the call
of fold_xx_to_x down below the ti_is_const(t2) check.
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
When lowering tst comparisons, completely fold the and
opcode that we generate.
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This was the last use of fold_affected_mask,
now fully replaced by fold_masks_zosa.
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>