Commit graph

15794 commits

Author SHA1 Message Date
Peter Maydell
d168a08147 target/arm: Remove redundant advsimd float16 helpers
The advsimd_addh etc helpers defined in helper-a64.c are identical to
the vfp_addh etc helpers defined in helper-vfp.c: both take two
float16 inputs (in a uint32_t type) plus a float_status* and are
simple wrappers around the softfloat float16_* functions.

(The duplication seems to be a historical accident: we added the
advsimd helpers in 2018 as part of the A64 implementation, and at
that time there was no f16 emulation in A32.  Then later we added the
A32 f16 handling by extending the existing VFP helper macros to
generate f16 versions as well as f32 and f64, and didn't realise we
could clean things up.)

Remove the now-unnecessary advsimd helpers and make the places that
generated calls to them use the vfp helpers instead. Many of the
helper functions were already unused.

(The remaining advsimd_ helpers are those which don't have vfp
versions.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-26-peter.maydell@linaro.org
2025-01-28 18:40:19 +00:00
Peter Maydell
7af64d103d fpu: Rename float_flag_output_denormal to float_flag_output_denormal_flushed
Our float_flag_output_denormal exception flag is set when
the fpu code flushes an output denormal to zero. Rename
it to float_flag_output_denormal_flushed:
 * this keeps it parallel with the flag for flushing
   input denormals, which we just renamed
 * it makes it clearer that it doesn't mean "set when
   the output is a denormal"

Commit created with
 for f in `git grep -l float_flag_output_denormal`; do sed -i -e 's/float_flag_output_denormal/float_flag_output_denormal_flushed/' $f; done

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-21-peter.maydell@linaro.org
2025-01-28 18:40:19 +00:00
Peter Maydell
584b7aec81 fpu: Rename float_flag_input_denormal to float_flag_input_denormal_flushed
Our float_flag_input_denormal exception flag is set when the fpu code
flushes an input denormal to zero.  This is what many guest
architectures (eg classic Arm behaviour) require, but it is not the
only donarmal-related reason we might want to set an exception flag.
The x86 behaviour (which we do not currently model correctly) wants
to see an exception flag when a denormal input is *not* flushed to
zero and is actually used in an arithmetic operation. Arm's FEAT_AFP
also wants these semantics.

Rename float_flag_input_denormal to float_flag_input_denormal_flushed
to make it clearer when it is set and to allow us to add a new
float_flag_input_denormal_used next to it for the x86/FEAT_AFP
semantics.

Commit created with
 for f in `git grep -l float_flag_input_denormal`; do sed -i -e 's/float_flag_input_denormal/float_flag_input_denormal_flushed/' $f; done

and manual editing of softfloat-types.h and softfloat.c to clean
up the indentation afterwards and to fix a comment which wasn't
using the full name of the flag.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-20-peter.maydell@linaro.org
2025-01-28 18:40:19 +00:00
Peter Maydell
3847b5b1fb target/arm: Remove now-unused vfp.fp_status_f16 and FPST_FPCR_F16
Now we have moved all the uses of vfp.fp_status_f16 and FPST_FPCR_F16
to the new A32 or A64 fields, we can remove these.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-19-peter.maydell@linaro.org
2025-01-28 18:40:19 +00:00
Peter Maydell
230c2bd3f2 target/arm: Use FPST_A64_F16 in A64 decoder
In the A32 decoder, use FPST_A64_F16 rather than FPST_FPCR_F16.
By doing an automated conversion of the whole file we avoid possibly
using more than one fpst value in a set_rmode/op/restore_rmode
sequence.

Patch created with
  perl -p -i -e 's/FPST_FPCR_F16(?!_)/FPST_A64_F16/g' target/arm/tcg/translate-{a64,sve,sme}.c

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-18-peter.maydell@linaro.org
2025-01-28 18:40:19 +00:00
Peter Maydell
e935710bc8 target/arm: Use FPST_A32_F16 in A32 decoder
In the A32 decoder, use FPST_A32_F16 rather than FPST_FPCR_F16.
By doing an automated conversion of the whole file we avoid possibly
using more than one fpst value in a set_rmode/op/restore_rmode
sequence.

Patch created with
  perl -p -i -e 's/FPST_FPCR_F16(?!_)/FPST_A32_F16/g' target/arm/tcg/translate-vfp.c

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-17-peter.maydell@linaro.org
2025-01-28 18:40:19 +00:00
Peter Maydell
e4b3c388f9 target/arm: Use fp_status_f16_a64 in AArch64-only helpers
We directly use fp_status_f16 in a handful of helpers that are
AArch64-specific; switch to fp_status_f16_a64 for these.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-16-peter.maydell@linaro.org
2025-01-28 18:40:19 +00:00
Peter Maydell
85fffc1085 target/arm: Use fp_status_f16_a32 in AArch32-only helpers
We directly use fp_status_f16 in a handful of helpers that
are AArch32-specific; switch to fp_status_f16_a32 for these.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-15-peter.maydell@linaro.org
2025-01-28 18:40:19 +00:00
Peter Maydell
5f4ed6da85 target/arm: Define new fp_status_f16_a32 and fp_status_f16_a64
As the first part of splitting the existing fp_status_f16
into separate float_status fields for AArch32 and AArch64
(so that we can make FEAT_AFP control bits apply only
for AArch64), define the two new fp_status_f16_a32 and
fp_status_f16_a64 fields, but don't use them yet.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-14-peter.maydell@linaro.org
2025-01-28 18:40:19 +00:00
Peter Maydell
2aa9656ebc target/arm: Remove now-unused vfp.fp_status and FPST_FPCR
Now we have moved all the uses of vfp.fp_status and FPST_FPCR
to either the A32 or A64 fields, we can remove these.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-13-peter.maydell@linaro.org
2025-01-28 18:40:19 +00:00
Peter Maydell
e107a7a54e target/arm: Use FPST_A64 in A64 decoder
In the A64 decoder, use FPST_A64 rather than FPST_FPCR.  By
doing an automated conversion of the whole file we avoid possibly
using more than one fpst value in a set_rmode/op/restore_rmode
sequence.

Patch created with

  perl -p -i -e 's/FPST_FPCR(?!_)/FPST_A64/g' target/arm/tcg/translate-{a64,sve,sme}.c

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-12-peter.maydell@linaro.org
2025-01-28 18:40:19 +00:00
Peter Maydell
961a8b3fb8 target/arm: Use FPST_A32 in A32 decoder
In the A32 decoder, use FPST_A32 rather than FPST_FPCR.  By
doing an automated conversion of the whole file we avoid possibly
using more than one fpst value in a set_rmode/op/restore_rmode
sequence.

Patch created with
  perl -p -i -e 's/FPST_FPCR(?!_)/FPST_A32/g' target/arm/tcg/translate-vfp.c

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-11-peter.maydell@linaro.org
2025-01-28 18:40:19 +00:00
Peter Maydell
d1ce6db3b1 target/arm: Use fp_status_a32 in vfp_cmp helpers
The helpers vfp_cmps, vfp_cmpes, vfp_cmpd, vfp_cmped are used only from
the A32 decoder; the A64 decoder uses separate vfp_cmps_a64 etc helpers
(because for A64 we update the main NZCV flags and for A32 we update
the FPSCR NZCV flags). So we can make these helpers use the fp_status_a32
field instead of fp_status.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-10-peter.maydell@linaro.org
2025-01-28 18:40:19 +00:00
Peter Maydell
1069d8ab30 target/arm: Use fp_status_a32 in vjvct helper
Use fp_status_a32 in the vjcvt helper function; this is called only
from the A32/T32 decoder and is not used inside a
set_rmode/restore_rmode sequence.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-9-peter.maydell@linaro.org
2025-01-28 18:40:19 +00:00
Peter Maydell
75df4e8609 target/arm: Use fp_status_a64 or fp_status_a32 in is_ebf()
In is_ebf(), we might be called for A64 or A32, but we have
the CPUARMState* so we can select fp_status_a64 or
fp_status_a32 accordingly.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2025-01-28 18:40:19 +00:00
Peter Maydell
57bd2f30ff target/arm: Use vfp.fp_status_a64 in A64-only helper functions
Switch from vfp.fp_status to vfp.fp_status_a64 for helpers which:
 * directly reference an fp_status field
 * are called only from the A64 decoder
 * are not called inside a set_rmode/restore_rmode sequence

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20250124162836.2332150-8-peter.maydell@linaro.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2025-01-28 18:40:19 +00:00
Peter Maydell
2208cb46e6 target/arm: Define new fp_status_a32 and fp_status_a64
We want to split the existing fp_status in the Arm CPUState into
separate float_status fields for AArch32 and AArch64.  (This is
because new control bits defined by FEAT_AFP only have an effect for
AArch64, not AArch32.) To make this split we will:
 * define new fp_status_a32 and fp_status_a64 which have
   identical behaviour to the existing fp_status
 * move existing uses of fp_status to fp_status_a32 or
   fp_status_a64 as appropriate
 * delete the old fp_status when it has no uses left

In this patch we add the new float_status fields.

We will also need to split fp_status_f16, but we will do that
as a separate series of patches.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-7-peter.maydell@linaro.org
2025-01-28 18:40:19 +00:00
Peter Maydell
eda8d53083 target/arm: Use uint32_t in vfp_exceptbits_from_host()
In vfp_exceptbits_from_host(), we accumulate the FPSR flags in
an "int", and our return type is also "int". However, the only
callsite returns the same information as a uint32_t, and
more generally we handle FPSR values in the code as uint32_t,
not int. Bring this function in to line with that convention.

There is no behaviour change because none of the FPSR bits
we set in this function are bit 31. The input argument to
the function remains 'int' because that is the return type
of the softfloat get_float_exception_flags().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-6-peter.maydell@linaro.org
2025-01-28 18:40:19 +00:00
Peter Maydell
f10dee833f target/arm: Use FPSR_ constants in vfp_exceptbits_from_host()
Use the FPSR_ named constants in vfp_exceptbits_from_host(),
rather than hardcoded magic numbers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-5-peter.maydell@linaro.org
2025-01-28 18:40:19 +00:00
Peter Maydell
1edc3d43f2 target/arm: arm_reset_sve_state() should set FPSR, not FPCR
The pseudocode ResetSVEState() does:
    FPSR = ZeroExtend(0x0800009f<31:0>, 64);
but QEMU's arm_reset_sve_state() called vfp_set_fpcr() by accident.

Before the advent of FEAT_AFP, this was only setting a collection of
RES0 bits, which vfp_set_fpsr() would then ignore, so the only effect
was that we didn't actually set the FPSR the way we are supposed to
do.  Once FEAT_AFP is implemented, setting the bottom bits of FPSR
will change the floating point behaviour.

Call vfp_set_fpsr(), as we ought to.

(Note for stable backports: commit 7f2a01e736 moved this function
from sme_helper.c to helper.c, but it had the same bug before the
move too.)

Cc: qemu-stable@nongnu.org
Fixes: f84734b874 ("target/arm: Implement SMSTART, SMSTOP")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-4-peter.maydell@linaro.org
2025-01-28 18:40:19 +00:00
Stefan Hajnoczi
b5afd8c023 hppa updates
* Fixes booting a Linux kernel which is provided on the command line.
 * Allow more than 4GB RAM on 64-bit boxes
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCZ5PvvgAKCRD3ErUQojoP
 X7JQAQCn2MR4k4lfClDZHNmAFUNw51j56SB5HC/FCUKfOx4dCQD/Tf2OV/gstMOz
 nfpvIH6ouXZ2/p5npzTyOt+A8fwUpw0=
 =qrs7
 -----END PGP SIGNATURE-----

Merge tag 'hppa-system-for-v10-pull-request' of https://github.com/hdeller/qemu-hppa into staging

hppa updates

* Fixes booting a Linux kernel which is provided on the command line.
* Allow more than 4GB RAM on 64-bit boxes

# -----BEGIN PGP SIGNATURE-----
#
# iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCZ5PvvgAKCRD3ErUQojoP
# X7JQAQCn2MR4k4lfClDZHNmAFUNw51j56SB5HC/FCUKfOx4dCQD/Tf2OV/gstMOz
# nfpvIH6ouXZ2/p5npzTyOt+A8fwUpw0=
# =qrs7
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 24 Jan 2025 14:53:34 EST
# gpg:                using EDDSA key BCE9123E1AD29F07C049BBDEF712B510A23A0F5F
# gpg: Good signature from "Helge Deller <deller@gmx.de>" [unknown]
# gpg:                 aka "Helge Deller <deller@kernel.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 4544 8228 2CD9 10DB EF3D  25F8 3E5F 3D04 A7A2 4603
#      Subkey fingerprint: BCE9 123E 1AD2 9F07 C049  BBDE F712 B510 A23A 0F5F

* tag 'hppa-system-for-v10-pull-request' of https://github.com/hdeller/qemu-hppa:
  hw/hppa: Fix booting Linux kernel with initrd
  hw/hppa: Support up to 256 GiB RAM on 64-bit machines

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-01-27 11:20:21 -05:00
Helge Deller
c656f293df hw/hppa: Fix booting Linux kernel with initrd
Commit 20f7b89017 ("hw/hppa: Reset vCPUs calling resettable_reset()")
broke booting the Linux kernel with initrd which may have been provided
on the command line. The problem is, that the mentioned commit zeroes
out initial registers which were preset with addresses for the Linux
kernel and initrd.

Fix it by adding proper variables which are set shortly before starting
the firmware.

Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: 20f7b89017 ("hw/hppa: Reset vCPUs calling resettable_reset()")
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2025-01-24 20:51:53 +01:00
Bibo Mao
3215fe8528 target/loongarch: Dump all generic CSR registers
CSR registers is import system control registers, it had better
dump all CSR registers when VM is running in system mode.

Here is dump output example of CSR registers:
 CSR000: CRMD   b4               PRMD   4                EUEN   0                MISC   0
 CSR004: ECFG   71c1c            ESTAT  0                ERA    9000000002c31300 BADV   12022c0e0
 CSR008: BADI   2b0000
 CSR012: EENTRY 90000000046b0000
 CSR016: TLBIDX ffffffff8e000228 TLBEHI 120228000        TLBELO0 400000016f19001f TLBELO1 400000016f1a401f
 CSR024: ASID   a0004            PGDL   90000001016f0000 PGDH   9000000004680000 PGD    0
 CSR028: PWCL   5e56e            PWCH   2e4              STLBPS e                RVACFG 0
 CSR032: CPUID  0                PRCFG1 72f8             PRCFG2 3ffff000         PRCFG3 8073f2
 CSR048: SAVE0  0                SAVE1  af9c             SAVE2  12010d6a8        SAVE3  8300000
 CSR052: SAVE4  0                SAVE5  0                SAVE6  0                SAVE7  0
 CSR064: TID    0                TCFG   8f0ca15          TVAL   4cefd8b          CNTC   fffffffffe688aaa
 CSR068: TICLR  0
 CSR096: LLBCTL 1
 CSR136: TLBRENTRY 46ba000       TLBRBADV ffff8000130d81e2 TLBRERA 9000000003585cb8 TLBRSAVE ffff8000130d81e0
 CSR140: TLBRELO0 1fe00043       TLBRELO1 40             TLBREHI ffff8000130d800e TLBRPRMD 0
 CSR384: DMW0   8000000000000001 DMW1   9000000000000011 DMW2   0                DMW3   0

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
2025-01-24 14:49:24 +08:00
Bibo Mao
b5b13eb712 target/loongarch: Set unused flag with CSR registers
On LA464, some CSR registers are not used such as CSR_SAVE8 -
CSR_SAVE15, also CSR registers relative with MCE is not used now.

Flag CSRFL_UNUSED is added for these registers, so that it will
not dumped. In order to keep compatiblity, these CSR registers are
not removed since it is used in vmstate already.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
2025-01-24 14:49:24 +08:00
Bibo Mao
cb6fa4142f target/loongarch: Add common source file for CSR register
Common source file csr.c is added here, it can be used by both
TCG mode and kvm mode. The common code is removed from file
tcg/insn_trans/trans_privileged.c.inc to csrc.c

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
2025-01-24 14:49:24 +08:00
Bibo Mao
d03114ea20 target/loongarch: Add common header file for CSR registers
Common header file csr.h is added here, it can be used by both
TCG mode and kvm mode.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
2025-01-24 14:49:24 +08:00
Bibo Mao
75b2c5da94 target/loongarch: Add generic csr function type
Parameter type TCGv and TCGv_ptr for function GenCSRRead and GenCSRWrite
is not used in non-TCG mode. Generic csr function type is added here
with parameter void type, so that it passes to compile with non-TCG mode.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
2025-01-24 14:49:24 +08:00
Bibo Mao
3156b1c1e9 target/loongarch: Remove static CSR function setting
Since CSR function setting is done dynamically in TCG mode, remove
static CSR function setting here.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
2025-01-24 14:49:24 +08:00
Bibo Mao
90f73c2d7f target/loongarch: Add dynamic function access with CSR register
With CSR register, dynamic function access is used for CSR register
access in TCG mode, so that csr info can be used by other modules.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
2025-01-24 14:49:24 +08:00
Tao Su
56e84d898f target/i386: Add new CPU model ClearwaterForest
According to table 1-2 in Intel Architecture Instruction Set Extensions
and Future Features (rev 056) [1], ClearwaterForest has the following new
features which have already been virtualized:

    - AVX-VNNI-INT16 CPUID.(EAX=7,ECX=1):EDX[bit 10]
    - SHA512 CPUID.(EAX=7,ECX=1):EAX[bit 0]
    - SM3 CPUID.(EAX=7,ECX=1):EAX[bit 1]
    - SM4 CPUID.(EAX=7,ECX=1):EAX[bit 2]

Add above features to new CPU model ClearwaterForest. Comparing with
SierraForest, ClearwaterForest bare-metal contains all features of
SierraForest-v2 CPU model and adds:

    - PREFETCHI CPUID.(EAX=7,ECX=1):EDX[bit 14]
    - DDPD_U CPUID.(EAX=7,ECX=2):EDX[bit 3]
    - BHI_NO IA32_ARCH_CAPABILITIES[bit 20]

Add above and all features of SierraForest-v2 CPU model to new CPU model
ClearwaterForest.

[1] https://cdrdv2.intel.com/v1/dl/getContent/671368

Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250121020650.1899618-4-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-23 11:50:53 +01:00
Tao Su
b611931d4f target/i386: Export BHI_NO bit to guests
Branch History Injection (BHI) is a CPU side-channel vulnerability, where
an attacker may manipulate branch history before transitioning from user
to supervisor mode or from VMX non-root/guest to root mode. CPUs that set
BHI_NO bit in MSR IA32_ARCH_CAPABILITIES to indicate no additional
mitigation is required to prevent BHI.

Make BHI_NO bit available to guests.

Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250121020650.1899618-3-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-23 11:50:53 +01:00
Tao Su
c597ff5339 target/i386: Introduce SierraForest-v2 model
Update SierraForest CPU model to add LAM, 4 bits indicating certain bits
of IA32_SPEC_CTR are supported(intel-psfd, ipred-ctrl, rrsba-ctrl,
bhi-ctrl) and the missing features(ss, tsc-adjust, cldemote, movdiri,
movdir64b)

Also add GDS-NO and RFDS-NO to indicate the related vulnerabilities are
mitigated in stepping 3.

Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250121020650.1899618-2-tao1.su@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-23 11:50:53 +01:00
Paolo Bonzini
22063f03a7 target/i386: avoid using s->tmp0 for add to implicit registers
For updates to implicit registers (RCX in LOOP instructions, RSI or RDI
in string instructions, or the stack pointer) do the add directly using
the registers (with no temporary) if 32-bit or 64-bit, or use a temporary
created for the occasion if 16-bit.  This is more efficient and removes
move instructions for the MO_TL case.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-14-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-23 11:50:53 +01:00
Paolo Bonzini
82290c7647 target/i386: extract common bits of gen_repz/gen_repz_nz
Now that everything has been cleaned up, look at DF and prefixes
in a single function, and call that one from gen_repz and gen_repz_nz.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-23 11:50:44 +01:00
Paolo Bonzini
4f094e27f3 target/i386: pull computation of string update value out of loop
This is a common operation that is executed many times in rep
movs or rep stos loops.  It can improve performance by several
percentage points.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20241215090613.89588-13-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-23 11:35:33 +01:00
Paolo Bonzini
456709db50 target/i386: execute multiple REP/REPZ iterations without leaving TB
Use a TCG loop so that it is not necessary to go through the setup steps
of REP and through the I/O check on every iteration.  Interestingly, this
is not a particularly effective optimization on its own, though it avoids
the cost of correct RF emulation that was added in the previous patch.
The main benefit lies in allowing the hoisting of loop invariants outside
the loop, which will happen separately.

The loop exits when the low 16 bits of CX/ECX/RCX are zero (so generally
speaking the string operation runs in 65536 iteration batches) to give
the main loop an opportunity to pick up interrupts.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-12-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-23 11:35:33 +01:00
Paolo Bonzini
0360b78187 target/i386: optimize CX handling in repeated string operations
In a repeated string operation, CX/ECX will be decremented until it
is 0 but never underflow.  Use this observation to avoid a deposit or
zero-extend operation if the address size of the operation is smaller
than MO_TL.

As in the previous patch, the patch is structured to include some
preparatory work for subsequent changes.  In particular, introducing
cx_next prepares for when ECX will be decremented *before* calling
fn(s, ot), and therefore cannot yet be written back to cpu_regs.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-11-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-23 11:35:33 +01:00
Paolo Bonzini
3658116025 target/i386: do not use gen_op_jz_ecx for repeated string operations
Explicitly generate a TSTEQ branch (which is optimized to NE x,0 if possible).
This does not make much sense yet, but later we will add more checks and some
will use a temporary to check on the decremented value of CX/ECX/RCX; it will
be clearer for all checks to share the same logic using TSTEQ(reg, cx_mask).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-10-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-23 11:35:33 +01:00
Paolo Bonzini
6986cf0032 target/i386: make cc_op handling more explicit for repeated string instructions.
Since the cost of gen_update_cc_op() must be paid anyway, it's easier
to place them manually and not rely on spilling that is buried under
multiple levels of function calls.  While at it, clarify the circumstances
in which the gen_update_cc_op() is needed, and why it is not for REPxx
SCAS and REPxx CMPS.

And since cc_op will have been spilled at the point of a fault, just
make the whole insn CC_OP_DYNAMIC.  Once repz_opt is reintroduced,
a fault could happen either before or after the first execution of
CMPS/SCAS, and CC_OP_DYNAMIC sidesteps the complicated matter of what
x86_restore_state_to_opc would do.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20241215090613.89588-9-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-23 11:35:33 +01:00
Paolo Bonzini
0d82d9e846 target/i386: fix RF handling for string instructions
RF must be set on traps and interrupts from a string instruction,
except if they occur after the last iteration.  Ensure it is set
before giving the main loop a chance to execute.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-8-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-23 11:35:33 +01:00
Paolo Bonzini
4d7704ebc5 target/i386: tcg: move gen_set/reset_* earlier in the file
Allow using them in the code that translates REP/REPZ, without
forward declarations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-7-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-23 11:35:33 +01:00
Paolo Bonzini
0eb7046e1b target/i386: reorganize ops emitted by do_gen_rep, drop repz_opt
The condition for optimizing repeat instruction is more or less the
opposite of what you imagine: almost always the string instruction
was _not_ optimized and optimizing the loop relied on goto_tb.
This is obviously not great for performance, due to the cost of the
exit-to-main-loop check, but also wrong.  In fact, after expanding
dc->jmp_opt and simplifying "!!x" to "x", the condition for looping used
to be:

   ((cflags & CF_NO_GOTO_TB) ||
    (flags & (HF_RF_MASK | HF_TF_MASK | HF_INHIBIT_IRQ_MASK))) && !(cflags & CF_USE_ICOUNT)

In other words, setting aside RF (it requires special handling for REP
instructions and it was completely missing), repeat instruction were
being optimized if TF or inhibit IRQ flags were set.  This is certainly
wrong for TF, because string instructions trap after every execution,
and probably for interrupt shadow too.

Get rid of repz_opt completely.  The next patches will reintroduce the
optimization, applying it in the common case instead of the unlikely
and wrong one.

While at it, place the CX/ECX/RCX=0 case is at the end of the function,
which saves a label and is clearer when reading the generated ops.
For clarity, mark the cc_op explicitly as DYNAMIC even if at the end
of the translation block; the cc_op can come from either the previous
instruction or the string instruction, and currently we rely on
a gen_update_cc_op() that is hidden in the bowels of gen_jcc() to
spill cc_op and mark it clean.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-6-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-23 11:35:33 +01:00
Paolo Bonzini
d8d552d459 target/i386: unify choice between single and repeated string instructions
The same "if" is present in all generator functions for string instructions.
Push it inside gen_repz() and gen_repz_nz() instead.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20241215090613.89588-5-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-23 11:35:33 +01:00
Paolo Bonzini
b519556f58 target/i386: unify REP and REPZ/REPNZ generation
It only differs in a single call to gen_jcc, so use a "bool" argument
to distinguish the two cases; do not duplicate code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-4-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-23 11:35:33 +01:00
Paolo Bonzini
e604be4fb4 target/i386: remove trailing 1 from gen_{j, cmov, set}cc1
This is not needed anymore now that gen_jcc has been eliminated
(merged into the similarly-named gen_Jcc, where the uppercase letter
gives away that it is an emission function).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-3-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-23 11:35:33 +01:00
Paolo Bonzini
6ace2d5163 target/i386: inline gen_jcc into sole caller
The code of gen_Jcc is very similar to gen_LOOP* and gen_JCXZ, but this
is hidden by gen_jcc.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-2-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-23 11:35:33 +01:00
Stefan Hajnoczi
32a97c5d05 tcg:
- Add TCGOP_TYPE, TCGOP_FLAGS.
   - Pass type and flags to tcg_op_supported, tcg_target_op_def.
   - Split out tcg-target-has.h and unexport from tcg.h.
   - Reorg constraint processing; constify TCGOpDef.
   - Make extract, sextract, deposit opcodes mandatory.
   - Merge ext{8,16,32}{s,u} opcodes into {s}extract.
 tcg/mips: Expand bswap unconditionally
 tcg/riscv: Use SRAIW, SRLIW for {s}extract_i64
 tcg/riscv: Use BEXTI for single-bit extractions
 tcg/sparc64: Use SRA, SRL for {s}extract_i64
 
 disas/riscv: Guard dec->cfg dereference for host disassemble
 util/cpuinfo-riscv: Detect Zbs
 accel/tcg: Call tcg_tb_insert() for one-insn TBs
 linux-user: Add missing /proc/cpuinfo fields for sparc
 -----BEGIN PGP SIGNATURE-----
 
 iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmeKnzUdHHJpY2hhcmQu
 aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV+Kvgf+LG9UjXlWF9GK923E
 TllBL2rLf1OOdtTXWO15VcvGMoWDwB3tVBdhihdvXmnWju+WbfMk6mct5NhzsKn9
 LmuugMIZs+hMROj+bgMK8x47jRIh5N2rDYxcEgmyfIpYb2o9qvyqKecGVRlSJTCE
 bmt5UFbvPThBb8upoMfq3F6evuMx0szBP7wrOwSR/VGpmzIr20UTEWo6I1ALp4uj
 paFaysYol4em3dIhkiuV9cL7E0EIObaNa7l9RUci/BmTq+JaVxUnW1Y2i0PEwKwG
 FJSfYTJk3wBgAVxC2zC2g3ZM7uKuecSXMpiFopTiuyQLp7Q61i9kCNvEq0qY5tdb
 DaqR/g==
 =cv4O
 -----END PGP SIGNATURE-----

Merge tag 'pull-tcg-20250117' of https://gitlab.com/rth7680/qemu into staging

tcg:
  - Add TCGOP_TYPE, TCGOP_FLAGS.
  - Pass type and flags to tcg_op_supported, tcg_target_op_def.
  - Split out tcg-target-has.h and unexport from tcg.h.
  - Reorg constraint processing; constify TCGOpDef.
  - Make extract, sextract, deposit opcodes mandatory.
  - Merge ext{8,16,32}{s,u} opcodes into {s}extract.
tcg/mips: Expand bswap unconditionally
tcg/riscv: Use SRAIW, SRLIW for {s}extract_i64
tcg/riscv: Use BEXTI for single-bit extractions
tcg/sparc64: Use SRA, SRL for {s}extract_i64

disas/riscv: Guard dec->cfg dereference for host disassemble
util/cpuinfo-riscv: Detect Zbs
accel/tcg: Call tcg_tb_insert() for one-insn TBs
linux-user: Add missing /proc/cpuinfo fields for sparc

# -----BEGIN PGP SIGNATURE-----
#
# iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmeKnzUdHHJpY2hhcmQu
# aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV+Kvgf+LG9UjXlWF9GK923E
# TllBL2rLf1OOdtTXWO15VcvGMoWDwB3tVBdhihdvXmnWju+WbfMk6mct5NhzsKn9
# LmuugMIZs+hMROj+bgMK8x47jRIh5N2rDYxcEgmyfIpYb2o9qvyqKecGVRlSJTCE
# bmt5UFbvPThBb8upoMfq3F6evuMx0szBP7wrOwSR/VGpmzIr20UTEWo6I1ALp4uj
# paFaysYol4em3dIhkiuV9cL7E0EIObaNa7l9RUci/BmTq+JaVxUnW1Y2i0PEwKwG
# FJSfYTJk3wBgAVxC2zC2g3ZM7uKuecSXMpiFopTiuyQLp7Q61i9kCNvEq0qY5tdb
# DaqR/g==
# =cv4O
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 17 Jan 2025 13:19:33 EST
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* tag 'pull-tcg-20250117' of https://gitlab.com/rth7680/qemu: (68 commits)
  softfloat: Constify helpers returning float_status field
  accel/tcg: Call tcg_tb_insert() for one-insn TBs
  tcg: Document tb_lookup() and tcg_tb_lookup()
  linux-user: Add missing /proc/cpuinfo fields for sparc
  tcg/riscv: Use BEXTI for single-bit extractions
  util/cpuinfo-riscv: Detect Zbs
  tcg: Remove TCG_TARGET_HAS_deposit_{i32,i64}
  tcg: Remove TCG_TARGET_HAS_{s}extract_{i32,i64}
  tcg/tci: Remove assertions for deposit and extract
  tcg/tci: Provide TCG_TARGET_{s}extract_valid
  tcg/sparc64: Use SRA, SRL for {s}extract_i64
  tcg/s390x: Fold the ext{8,16,32}[us] cases into {s}extract
  tcg/riscv: Use SRAIW, SRLIW for {s}extract_i64
  tcg/riscv64: Fold the ext{8,16,32}[us] cases into {s}extract
  tcg/ppc: Fold the ext{8,16,32}[us] cases into {s}extract
  tcg/mips: Fold the ext{8,16,32}[us] cases into {s}extract
  tcg/loongarch64: Fold the ext{8,16,32}[us] cases into {s}extract
  tcg/arm: Add full [US]XT[BH] into {s}extract
  tcg/aarch64: Expand extract with offset 0 with andi
  tcg/aarch64: Provide TCG_TARGET_{s}extract_valid
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-01-21 08:28:33 -05:00
Alexey Baturo
941f76e293 target/riscv: Support Supm and Sspm as part of Zjpm v1.0
The Zjpm v1.0 spec states there should be Supm and Sspm extensions that
are used in profile specification. Enabling Supm extension enables both
Ssnpm and Smnpm, while Sspm enables only Smnpm.

Signed-off-by: Alexey Baturo <baturo.alexey@gmail.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20250113194410.1307494-1-baturo.alexey@gmail.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2025-01-19 09:44:35 +10:00
Clément Léger
2d8e825928 target/riscv: Add Smdbltrp ISA extension enable switch
Add the switch to enable the Smdbltrp ISA extension and disable it for
the max cpu. Indeed, OpenSBI when Smdbltrp is present, M-mode double
trap is enabled by default and MSTATUS.MDT needs to be cleared to avoid
taking a double trap. OpenSBI does not currently support it so disable
it for the max cpu to avoid breaking regression tests.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20250116131539.2475785-1-cleger@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2025-01-19 09:44:35 +10:00
Clément Léger
00af7d5360 target/riscv: Implement Smdbltrp behavior
When the Smsdbltrp ISA extension is enabled, if a trap happens while
MSTATUS.MDT is already set, it will trigger an abort or an NMI is the
Smrnmi extension is available.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250110125441.3208676-9-cleger@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2025-01-19 09:44:35 +10:00