[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES
From: |
Peter Maydell |
Subject: |
[PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES |
Date: |
Fri, 24 Jan 2025 16:27:20 +0000 |
This patchset implements emulation of the Arm FEAT_AFP and FEAT_RPRES
extensions, which are floating-point related. It's based on the
small i386 bugfix series I sent out a while back:
Based-on: 20250116112536.4117889-1-peter.maydell@linaro.org
("target/i386: Fix 0 * Inf + QNaN regression")
(It would also have been based on an initial refactoring series
I sent out on Monday, but AFAICT the list just ate those emails
and they never arrived anywhere :-( So you get a bigger series
here than I'd hoped.)
If you'd rather have these patches as a git branch:
https://git.linaro.org/people/pmaydell/qemu-arm.git feat-afp
with human readable web view at:
https://git.linaro.org/people/peter.maydell/qemu-arm.git/log/?h=feat-afp
FEAT_AFP defines three new control bits in the FPCR, whose
operations are basically independent of each other:
* FPCR.AH: "alternate floating point mode"; this changes floating
point behaviour in a variety of ways, including:
- the sign of a default NaN is 1, not 0
- if FPCR.FZ is also 1, denormals detected after rounding
with an unbounded exponent has been applied are flushed to zero
- FPCR.FZ does not cause denormalized inputs to be flushed to zero
- miscellaneous other corner-case behaviour changes
* FPCR.FIZ: flush denormalized numbers to zero on input for
most instructions
* FPCR.NEP: makes scalar SIMD operations merge the result with
higher vector elements in one of the source registers, instead
of zeroing the higher elements of the destination
FEAT_RPRES makes single-precision FRECPE and FRSQRTE use a 12-bit
mantissa precision instead of 8-bit when FPCR.AH is set.
Because FPCR.AH implies quite a lot of changes to corner cases
of floating point handling, the resulting patchseries is regrettably
quite big.
Structure of the patchseries:
* patch 1 fixes a silly bug in arm_reset_sve_state() which only
has a major bad effect once FEAT_AFP is implemented
* patches 2-16 are a refactoring which splits the existing
fp_status and fp_status_f16 so that each have separate a32 and
a64 versions. We need this because the FEAT_AFP bits only have
an effect for A64 insns, not A32 insns
* patches 17-22 add some more functionality to softfloat that we
need for FEAT_AFP:
- an exception flag float_flag_input_denormal_used is set when
an input to an fp op is denormal, is not squashed to zero,
and is actually consumed (i.e. not an invalid operation or
an operation where the other input was a NaN)
- a control setting float_detect_ftz which lets the target
control whether flush-to-zero of outputs should be done
before or after rounding
(Both these are needed for correct x86 FP emulation, incidentally.)
* patches 23-28 define the FPCR bits and implement the parts of the
functionality which can be handled by setting softfloat control
knobs and adjusting how we handle softfloat exception flags.
(This includes all of the FPCR.FIZ behaviour.)
* patches 29-33 implement FPCR.AH handling of a small group of
insns (FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS, BFCVT*, BFMLAL*,
BFMLSL*) which must:
- never update FPSR exception flags
- always round-to-nearest-even
- always flush single and double denormal inputs and outputs to zero
We implement this via some new float_status fields that we use for
this group of insns.
* patches 34-42 implement the FPCR.NEP "merge high vector elements of
a source register with the result of a scalar operation" behaviour
* patches 43-49 implement FPCR.AH semantics for FMIN and FMAX:
- comparing two zeroes (even of different sign) or comparing a NaN
with anything always returns the second argument (possibly
squashed to zero)
- denormal outputs are not squashed to zero regardless of FZ or FZ16
* patches 50-65 implement FPCR.AH semantics for abs and neg of floating
point values: they must not change the sign bit of a NaN. This applies
not just to the ABS and NEG insns but to any other insn whose
pseudocode has it doing an FPAbs() or FPNeg() operation (e.g.
FMLS, FRECPS, FTSSEL).
* at this point patch 66 can enable FEAT_AFP for -cpu max
* patches 67-70 implement FEAT_RPRES
I have also some patchs which make target/i386 use the "detect
flush to zero after rounding" and "report when input denormal is
consumed" softfloat features added here; I don't include them in
this patchset (though you can find them in that git branch I
mentioned earlier) becaus I haven't done as much testing on the
i386 side and in any case this patchset is already pretty long.
I expect I'll send them out when this series has been merged.
thanks
-- PMM
Peter Maydell (76):
target/i386: Do not raise Invalid for 0 * Inf + QNaN
tests/tcg/x86_64/fma: Test some x86 fused-multiply-add cases
target/arm: arm_reset_sve_state() should set FPSR, not FPCR
target/arm: Use FPSR_ constants in vfp_exceptbits_from_host()
target/arm: Use uint32_t in vfp_exceptbits_from_host()
target/arm: Define new fp_status_a32 and fp_status_a64
target/arm: Use vfp.fp_status_a64 in A64-only helper functions
target/arm: Use fp_status_a32 in vjvct helper
target/arm: Use fp_status_a32 in vfp_cmp helpers
target/arm: Use FPST_FPCR_A32 in A32 decoder
target/arm: Use FPST_FPCR_A64 in A64 decoder
target/arm: Remove now-unused vfp.fp_status and FPST_FPCR
target/arm: Define new fp_status_f16_a32 and fp_status_f16_a64
target/arm: Use fp_status_f16_a32 in AArch32-only helpers
target/arm: Use fp_status_f16_a64 in AArch64-only helpers
target/arm: Use FPST_FPCR_F16_A32 in A32 decoder
target/arm: Use FPST_FPCR_F16_A64 in A64 decoder
target/arm: Remove now-unused vfp.fp_status_f16 and FPST_FPCR_F16
fpu: Rename float_flag_input_denormal to
float_flag_input_denormal_flushed
fpu: Rename float_flag_output_denormal to
float_flag_output_denormal_flushed
fpu: Fix a comment in softfloat-types.h
fpu: Add float_class_denormal
fpu: Implement float_flag_input_denormal_used
fpu: allow flushing of output denormals to be after rounding
target/arm: Remove redundant advsimd float16 helpers
target/arm: Use FPST_FPCR_F16_A64 for halfprec-to-other conversions
target/arm: Define FPCR AH, FIZ, NEP bits
target/arm: Implement FPCR.FIZ handling
target/arm: Adjust FP behaviour for FPCR.AH = 1
target/arm: Adjust exception flag handling for AH = 1
target/arm: Add FPCR.AH to tbflags
target/arm: Set up float_status to use for FPCR.AH=1 behaviour
target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE,
FRSQRTS
target/arm: Use FPST_FPCR_AH for BFCVT* insns
target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns
target/arm: Add FPCR.NEP to TBFLAGS
target/arm: Define and use new write_fp_*reg_merging() functions
target/arm: Handle FPCR.NEP for 3-input scalar operations
target/arm: Handle FPCR.NEP for BFCVT scalar
target/arm: Handle FPCR.NEP for 1-input scalar operations
target/arm: Handle FPCR.NEP in do_cvtf_scalar()
target/arm: Handle FPCR.NEP for scalar FABS and FNEG
target/arm: Handle FPCR.NEP for FCVTXN (scalar)
target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element
target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX
target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX
target/arm: Implement FPCR.AH semantics for FMAXV and FMINV
target/arm: Implement FPCR.AH semantics for FMINP and FMAXP
target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV
target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate
target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector
target/arm: Implement FPCR.AH handling of negation of NaN
target/arm: Implement FPCR.AH handling for scalar FABS and FABD
target/arm: Handle FPCR.AH in vector FABD
target/arm: Handle FPCR.AH in SVE FNEG
target/arm: Handle FPCR.AH in SVE FABS
target/arm: Handle FPCR.AH in SVE FABD
target/arm: Handle FPCR.AH in negation steps in FCADD
target/arm: Handle FPCR.AH in negation steps in SVE FCADD
target/arm: Handle FPCR.AH in FMLSL
target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns
target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns
target/arm: Handle FPCR.AH in negation step in FMLS (indexed)
target/arm: Handle FPCR.AH in negation in FMLS (vector)
target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector)
target/arm: Handle FPCR.AH in SVE FTSSEL
target/arm: Handle FPCR.AH in SVE FTMAD
target/arm: Enable FEAT_AFP for '-cpu max'
target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper
target/arm: Implement increased precision FRECPE
target/arm: Implement increased precision FRSQRTE
target/arm: Enable FEAT_RPRES for -cpu max
target/i386: Detect flush-to-zero after rounding
target/i386: Use correct type for get_float_exception_flags() values
target/i386: Wire up MXCSR.DE and FPUS.DE correctly
tests/tcg/x86_64/fma: add test for exact-denormal output
docs/system/arm/emulation.rst | 2 +
include/fpu/softfloat-helpers.h | 11 +
include/fpu/softfloat-types.h | 51 +-
target/arm/cpu-features.h | 10 +
target/arm/cpu.h | 32 +-
target/arm/helper.h | 12 +
target/arm/internals.h | 6 +
target/arm/tcg/helper-a64.h | 21 +-
target/arm/tcg/helper-sve.h | 120 +++++
target/arm/tcg/translate.h | 63 ++-
target/i386/ops_sse.h | 16 +-
target/mips/fpu_helper.h | 6 +
fpu/softfloat.c | 71 ++-
target/alpha/cpu.c | 7 +
target/arm/cpu.c | 32 +-
target/arm/helper.c | 4 +-
target/arm/tcg/cpu64.c | 2 +
target/arm/tcg/helper-a64.c | 173 ++++---
target/arm/tcg/hflags.c | 13 +
target/arm/tcg/sme_helper.c | 6 +-
target/arm/tcg/sve_helper.c | 301 ++++++++---
target/arm/tcg/translate-a64.c | 850 ++++++++++++++++++++++++-------
target/arm/tcg/translate-sme.c | 4 +-
target/arm/tcg/translate-sve.c | 280 ++++++----
target/arm/tcg/translate-vfp.c | 78 +--
target/arm/tcg/vec_helper.c | 174 ++++++-
target/arm/vfp_helper.c | 369 +++++++++++---
target/hppa/fpu_helper.c | 11 +
target/i386/tcg/fpu_helper.c | 110 ++--
target/m68k/fpu_helper.c | 2 +-
target/mips/msa.c | 9 +
target/mips/tcg/msa_helper.c | 4 +-
target/ppc/cpu_init.c | 3 +
target/rx/cpu.c | 8 +
target/rx/op_helper.c | 4 +-
target/sh4/cpu.c | 8 +
target/tricore/fpu_helper.c | 6 +-
target/tricore/helper.c | 1 +
tests/fp/fp-bench.c | 1 +
tests/tcg/x86_64/fma.c | 116 +++++
fpu/softfloat-parts.c.inc | 136 ++++-
tests/tcg/x86_64/Makefile.target | 1 +
42 files changed, 2443 insertions(+), 691 deletions(-)
create mode 100644 tests/tcg/x86_64/fma.c
--
2.34.1
- [PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES,
Peter Maydell <=
- [PATCH 04/76] target/arm: Use FPSR_ constants in vfp_exceptbits_from_host(), Peter Maydell, 2025/01/24
- [PATCH 03/76] target/arm: arm_reset_sve_state() should set FPSR, not FPCR, Peter Maydell, 2025/01/24
- [PATCH 09/76] target/arm: Use fp_status_a32 in vfp_cmp helpers, Peter Maydell, 2025/01/24
- [PATCH 01/76] target/i386: Do not raise Invalid for 0 * Inf + QNaN, Peter Maydell, 2025/01/24
- [PATCH 05/76] target/arm: Use uint32_t in vfp_exceptbits_from_host(), Peter Maydell, 2025/01/24
- [PATCH 13/76] target/arm: Define new fp_status_f16_a32 and fp_status_f16_a64, Peter Maydell, 2025/01/24