[PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES

From:	Peter Maydell
Subject:	[PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES
Date:	Fri, 24 Jan 2025 16:27:20 +0000

This patchset implements emulation of the Arm FEAT_AFP and FEAT_RPRES
extensions, which are floating-point related. It's based on the
small i386 bugfix series I sent out a while back:

Based-on: 20250116112536.4117889-1-peter.maydell@linaro.org
("target/i386: Fix 0 * Inf + QNaN regression")

(It would also have been based on an initial refactoring series
I sent out on Monday, but AFAICT the list just ate those emails
and they never arrived anywhere :-(  So you get a bigger series
here than I'd hoped.)

If you'd rather have these patches as a git branch:
 https://git.linaro.org/people/pmaydell/qemu-arm.git  feat-afp
with human readable web view at:
 https://git.linaro.org/people/peter.maydell/qemu-arm.git/log/?h=feat-afp


FEAT_AFP defines three new control bits in the FPCR, whose
operations are basically independent of each other:
 * FPCR.AH: "alternate floating point mode"; this changes floating
   point behaviour in a variety of ways, including:
    - the sign of a default NaN is 1, not 0
    - if FPCR.FZ is also 1, denormals detected after rounding
      with an unbounded exponent has been applied are flushed to zero
    - FPCR.FZ does not cause denormalized inputs to be flushed to zero
    - miscellaneous other corner-case behaviour changes
 * FPCR.FIZ: flush denormalized numbers to zero on input for
   most instructions
 * FPCR.NEP: makes scalar SIMD operations merge the result with
   higher vector elements in one of the source registers, instead
   of zeroing the higher elements of the destination

FEAT_RPRES makes single-precision FRECPE and FRSQRTE use a 12-bit
mantissa precision instead of 8-bit when FPCR.AH is set.

Because FPCR.AH implies quite a lot of changes to corner cases
of floating point handling, the resulting patchseries is regrettably
quite big.

Structure of the patchseries:
 * patch 1 fixes a silly bug in arm_reset_sve_state() which only
   has a major bad effect once FEAT_AFP is implemented
 * patches 2-16 are a refactoring which splits the existing
   fp_status and fp_status_f16 so that each have separate a32 and
   a64 versions. We need this because the FEAT_AFP bits only have
   an effect for A64 insns, not A32 insns
 * patches 17-22 add some more functionality to softfloat that we
   need for FEAT_AFP:
    - an exception flag float_flag_input_denormal_used is set when
      an input to an fp op is denormal, is not squashed to zero,
      and is actually consumed (i.e. not an invalid operation or
      an operation where the other input was a NaN)
    - a control setting float_detect_ftz which lets the target
      control whether flush-to-zero of outputs should be done
      before or after rounding
   (Both these are needed for correct x86 FP emulation, incidentally.)
 * patches 23-28 define the FPCR bits and implement the parts of the
   functionality which can be handled by setting softfloat control
   knobs and adjusting how we handle softfloat exception flags.
   (This includes all of the FPCR.FIZ behaviour.)
 * patches 29-33 implement FPCR.AH handling of a small group of
   insns (FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS, BFCVT*, BFMLAL*,
   BFMLSL*) which must:
    - never update FPSR exception flags
    - always round-to-nearest-even
    - always flush single and double denormal inputs and outputs to zero
   We implement this via some new float_status fields that we use for
   this group of insns.
 * patches 34-42 implement the FPCR.NEP "merge high vector elements of
   a source register with  the result of a scalar operation" behaviour
 * patches 43-49 implement FPCR.AH semantics for FMIN and FMAX:
    - comparing two zeroes (even of different sign) or comparing a NaN
      with anything always returns the second argument (possibly
      squashed to zero)
    - denormal outputs are not squashed to zero regardless of FZ or FZ16
 * patches 50-65 implement FPCR.AH semantics for abs and neg of floating
   point values: they must not change the sign bit of a NaN. This applies
   not just to the ABS and NEG insns but to any other insn whose
   pseudocode has it doing an FPAbs() or FPNeg() operation (e.g.
   FMLS, FRECPS, FTSSEL).
 * at this point patch 66 can enable FEAT_AFP for -cpu max
 * patches 67-70 implement FEAT_RPRES

I have also some patchs which make target/i386 use the "detect
flush to zero after rounding" and "report when input denormal is
consumed" softfloat features added here; I don't include them in
this patchset (though you can find them in that git branch I
mentioned earlier) becaus I haven't done as much testing on the
i386 side and in any case this patchset is already pretty long.
I expect I'll send them out when this series has been merged.


thanks
-- PMM


Peter Maydell (76):
  target/i386: Do not raise Invalid for 0 * Inf + QNaN
  tests/tcg/x86_64/fma: Test some x86 fused-multiply-add cases
  target/arm: arm_reset_sve_state() should set FPSR, not FPCR
  target/arm: Use FPSR_ constants in vfp_exceptbits_from_host()
  target/arm: Use uint32_t in vfp_exceptbits_from_host()
  target/arm: Define new fp_status_a32 and fp_status_a64
  target/arm: Use vfp.fp_status_a64 in A64-only helper functions
  target/arm: Use fp_status_a32 in vjvct helper
  target/arm: Use fp_status_a32 in vfp_cmp helpers
  target/arm: Use FPST_FPCR_A32 in A32 decoder
  target/arm: Use FPST_FPCR_A64 in A64 decoder
  target/arm: Remove now-unused vfp.fp_status and FPST_FPCR
  target/arm: Define new fp_status_f16_a32 and fp_status_f16_a64
  target/arm: Use fp_status_f16_a32 in AArch32-only helpers
  target/arm: Use fp_status_f16_a64 in AArch64-only helpers
  target/arm: Use FPST_FPCR_F16_A32 in A32 decoder
  target/arm: Use FPST_FPCR_F16_A64 in A64 decoder
  target/arm: Remove now-unused vfp.fp_status_f16 and FPST_FPCR_F16
  fpu: Rename float_flag_input_denormal to
    float_flag_input_denormal_flushed
  fpu: Rename float_flag_output_denormal to
    float_flag_output_denormal_flushed
  fpu: Fix a comment in softfloat-types.h
  fpu: Add float_class_denormal
  fpu: Implement float_flag_input_denormal_used
  fpu: allow flushing of output denormals to be after rounding
  target/arm: Remove redundant advsimd float16 helpers
  target/arm: Use FPST_FPCR_F16_A64 for halfprec-to-other conversions
  target/arm: Define FPCR AH, FIZ, NEP bits
  target/arm: Implement FPCR.FIZ handling
  target/arm: Adjust FP behaviour for FPCR.AH = 1
  target/arm: Adjust exception flag handling for AH = 1
  target/arm: Add FPCR.AH to tbflags
  target/arm: Set up float_status to use for FPCR.AH=1 behaviour
  target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE,
    FRSQRTS
  target/arm: Use FPST_FPCR_AH for BFCVT* insns
  target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns
  target/arm: Add FPCR.NEP to TBFLAGS
  target/arm: Define and use new write_fp_*reg_merging() functions
  target/arm: Handle FPCR.NEP for 3-input scalar operations
  target/arm: Handle FPCR.NEP for BFCVT scalar
  target/arm: Handle FPCR.NEP for 1-input scalar operations
  target/arm: Handle FPCR.NEP in do_cvtf_scalar()
  target/arm: Handle FPCR.NEP for scalar FABS and FNEG
  target/arm: Handle FPCR.NEP for FCVTXN (scalar)
  target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element
  target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX
  target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX
  target/arm: Implement FPCR.AH semantics for FMAXV and FMINV
  target/arm: Implement FPCR.AH semantics for FMINP and FMAXP
  target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV
  target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate
  target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector
  target/arm: Implement FPCR.AH handling of negation of NaN
  target/arm: Implement FPCR.AH handling for scalar FABS and FABD
  target/arm: Handle FPCR.AH in vector FABD
  target/arm: Handle FPCR.AH in SVE FNEG
  target/arm: Handle FPCR.AH in SVE FABS
  target/arm: Handle FPCR.AH in SVE FABD
  target/arm: Handle FPCR.AH in negation steps in FCADD
  target/arm: Handle FPCR.AH in negation steps in SVE FCADD
  target/arm: Handle FPCR.AH in FMLSL
  target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns
  target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns
  target/arm: Handle FPCR.AH in negation step in FMLS (indexed)
  target/arm: Handle FPCR.AH in negation in FMLS (vector)
  target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector)
  target/arm: Handle FPCR.AH in SVE FTSSEL
  target/arm: Handle FPCR.AH in SVE FTMAD
  target/arm: Enable FEAT_AFP for '-cpu max'
  target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper
  target/arm: Implement increased precision FRECPE
  target/arm: Implement increased precision FRSQRTE
  target/arm: Enable FEAT_RPRES for -cpu max
  target/i386: Detect flush-to-zero after rounding
  target/i386: Use correct type for get_float_exception_flags() values
  target/i386: Wire up MXCSR.DE and FPUS.DE correctly
  tests/tcg/x86_64/fma: add test for exact-denormal output

 docs/system/arm/emulation.rst    |   2 +
 include/fpu/softfloat-helpers.h  |  11 +
 include/fpu/softfloat-types.h    |  51 +-
 target/arm/cpu-features.h        |  10 +
 target/arm/cpu.h                 |  32 +-
 target/arm/helper.h              |  12 +
 target/arm/internals.h           |   6 +
 target/arm/tcg/helper-a64.h      |  21 +-
 target/arm/tcg/helper-sve.h      | 120 +++++
 target/arm/tcg/translate.h       |  63 ++-
 target/i386/ops_sse.h            |  16 +-
 target/mips/fpu_helper.h         |   6 +
 fpu/softfloat.c                  |  71 ++-
 target/alpha/cpu.c               |   7 +
 target/arm/cpu.c                 |  32 +-
 target/arm/helper.c              |   4 +-
 target/arm/tcg/cpu64.c           |   2 +
 target/arm/tcg/helper-a64.c      | 173 ++++---
 target/arm/tcg/hflags.c          |  13 +
 target/arm/tcg/sme_helper.c      |   6 +-
 target/arm/tcg/sve_helper.c      | 301 ++++++++---
 target/arm/tcg/translate-a64.c   | 850 ++++++++++++++++++++++++-------
 target/arm/tcg/translate-sme.c   |   4 +-
 target/arm/tcg/translate-sve.c   | 280 ++++++----
 target/arm/tcg/translate-vfp.c   |  78 +--
 target/arm/tcg/vec_helper.c      | 174 ++++++-
 target/arm/vfp_helper.c          | 369 +++++++++++---
 target/hppa/fpu_helper.c         |  11 +
 target/i386/tcg/fpu_helper.c     | 110 ++--
 target/m68k/fpu_helper.c         |   2 +-
 target/mips/msa.c                |   9 +
 target/mips/tcg/msa_helper.c     |   4 +-
 target/ppc/cpu_init.c            |   3 +
 target/rx/cpu.c                  |   8 +
 target/rx/op_helper.c            |   4 +-
 target/sh4/cpu.c                 |   8 +
 target/tricore/fpu_helper.c      |   6 +-
 target/tricore/helper.c          |   1 +
 tests/fp/fp-bench.c              |   1 +
 tests/tcg/x86_64/fma.c           | 116 +++++
 fpu/softfloat-parts.c.inc        | 136 ++++-
 tests/tcg/x86_64/Makefile.target |   1 +
 42 files changed, 2443 insertions(+), 691 deletions(-)
 create mode 100644 tests/tcg/x86_64/fma.c

-- 
2.34.1

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH 00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES, Peter Maydell <=
- [PATCH 04/76] target/arm: Use FPSR_ constants in vfp_exceptbits_from_host(), Peter Maydell, 2025/01/24
  - Re: [PATCH 04/76] target/arm: Use FPSR_ constants in vfp_exceptbits_from_host(), Richard Henderson, 2025/01/25
- [PATCH 03/76] target/arm: arm_reset_sve_state() should set FPSR, not FPCR, Peter Maydell, 2025/01/24
  - Re: [PATCH 03/76] target/arm: arm_reset_sve_state() should set FPSR, not FPCR, Richard Henderson, 2025/01/25
- [PATCH 09/76] target/arm: Use fp_status_a32 in vfp_cmp helpers, Peter Maydell, 2025/01/24
  - Re: [PATCH 09/76] target/arm: Use fp_status_a32 in vfp_cmp helpers, Richard Henderson, 2025/01/25
- [PATCH 01/76] target/i386: Do not raise Invalid for 0 * Inf + QNaN, Peter Maydell, 2025/01/24
- [PATCH 05/76] target/arm: Use uint32_t in vfp_exceptbits_from_host(), Peter Maydell, 2025/01/24
  - Re: [PATCH 05/76] target/arm: Use uint32_t in vfp_exceptbits_from_host(), Richard Henderson, 2025/01/25
- [PATCH 13/76] target/arm: Define new fp_status_f16_a32 and fp_status_f16_a64, Peter Maydell, 2025/01/24

Prev by Date: [PATCH 03/76] target/arm: arm_reset_sve_state() should set FPSR, not FPCR
Next by Date: [PATCH 09/76] target/arm: Use fp_status_a32 in vfp_cmp helpers
Previous by thread: [PATCH v2 0/7] physmem: teach cpu_memory_rw_debug() to write to more memory regions
Next by thread: [PATCH 04/76] target/arm: Use FPSR_ constants in vfp_exceptbits_from_host()
Index(es):
- Date
- Thread