[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub
From: |
Alex Bennée |
Subject: |
Re: [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub |
Date: |
Wed, 21 Oct 2020 18:46:36 +0100 |
User-agent: |
mu4e 1.5.6; emacs 28.0.50 |
Richard Henderson <richard.henderson@linaro.org> writes:
> Hi Alex,
>
> Here's my first adjustment to your conversion for 128-bit floats.
>
> The Idea is to use a set of macros and an include file so that we
> can re-use the same large chunk of code that performs the basic
> operations on various fraction lengths. It's ugly, but without
> proper language support it seems to be less ugly than most.
>
> While I've just gone and added lots of stuff to int128... I have
> had another idea, half-baked because I'm tired and it's late:
>
> typedef struct {
> FloatClass cls;
> int exp;
> bool sign;
> uint64_t frac[];
> } FloatPartsBase;
>
> typedef struct {
> FloatPartsBase base;
> uint64_t frac;
> } FloatParts64;
>
> typedef struct {
> FloatPartsBase base;
> uint64_t frac_hi, frac_lo;
> } FloatParts128;
>
> typedef struct {
> FloatPartsBase base;
> uint64_t frac[4]; /* big endian word ordering */
> } FloatParts256;
>
> This layout, with the big-endian ordering, means that storage
> can be shared between them, just by ignoring the least significant
> words of the fraction as needed. Which may make muladd more
> understandable.
Would the big-endian formatting hamper the compiler on x86 where it can
do extra wide operations?
I am still seeing a multi MFlop drop in performance when converting the
float128_addsub to the new code. If this allows the compiler to do
better on the code I can live with it.
>
> E.g.
>
> static void muladd_floats64(FloatParts128 *r, FloatParts64 *a,
> FloatParts64 *b, FloatParts128 *c, ...)
> {
> // handle nans
> // produce 128-bit product into r
> // handle p vs c special cases.
> // zero-extend c to 128-bits
> c->frac[1] = 0;
> // perform 128-bit fractional addition
> addsub_floats128(r, c, ...);
> // fold 128-bit fraction to 64-bit sticky bit.
> r->frac[0] |= r->frac[1] != 0;
> }
>
> float64 float64_muladd(float64 a, float64 b, float64 c, ...)
> {
> FloatParts64 pa, pb;
> FloatParts128 pc, pr;
>
> float64_unpack_canonical(&pa.base, a, status);
> float64_unpack_canonical(&pb.base, b, status);
> float64_unpack_canonical(&pc.base, c, status);
> muladd_floats64(&pr, &pa, &pb, &pc, flags, status);
>
> return float64_round_pack_canonical(&pr.base, status);
> }
>
> Similarly, muladd_floats128 would use addsub_floats256.
>
> However, the big-endian word ordering means that Int128
> cannot be used directly; so a set of wrappers are needed.
> If added the Int128 routine just for use here, then it's
> probably easier to bypass Int128 and just code it here.
Are you talking about all our operations? Will we still need to#ifdef
CONFIG_INT128 in the softfloat code?
>
> Thoughts?
>
>
> r~
>
>
> Richard Henderson (15):
> qemu/int128: Add int128_or
> qemu/int128: Add int128_clz, int128_ctz
> qemu/int128: Rename int128_rshift, int128_lshift
> qemu/int128: Add int128_shr
> qemu/int128: Add int128_geu
> softfloat: Use mulu64 for mul64To128
> softfloat: Use int128.h for some operations
> softfloat: Tidy a * b + inf return
> softfloat: Add float_cmask and constants
> softfloat: Inline float_raise
> Test split to softfloat-parts.c.inc
> softfloat: Streamline FloatFmt
> Test float128_addsub
> softfloat: Use float_cmask for addsub_floats
> softfloat: Improve subtraction of equal exponent
>
> include/fpu/softfloat-macros.h | 89 ++--
> include/fpu/softfloat.h | 5 +-
> include/qemu/int128.h | 61 ++-
> fpu/softfloat.c | 802 ++++++++++-----------------------
> softmmu/physmem.c | 4 +-
> target/ppc/int_helper.c | 4 +-
> tests/test-int128.c | 44 +-
> fpu/softfloat-parts.c.inc | 339 ++++++++++++++
> fpu/softfloat-specialize.c.inc | 45 +-
> 9 files changed, 716 insertions(+), 677 deletions(-)
> create mode 100644 fpu/softfloat-parts.c.inc
--
Alex Bennée
- [RFC PATCH 08/15] softfloat: Tidy a * b + inf return, (continued)
- [RFC PATCH 08/15] softfloat: Tidy a * b + inf return, Richard Henderson, 2020/10/21
- [RFC PATCH 10/15] softfloat: Inline float_raise, Richard Henderson, 2020/10/21
- [RFC PATCH 09/15] softfloat: Add float_cmask and constants, Richard Henderson, 2020/10/21
- [RFC PATCH 12/15] softfloat: Streamline FloatFmt, Richard Henderson, 2020/10/21
- [RFC PATCH 11/15] Test split to softfloat-parts.c.inc, Richard Henderson, 2020/10/21
- [RFC PATCH 13/15] Test float128_addsub, Richard Henderson, 2020/10/21
- [RFC PATCH 15/15] softfloat: Improve subtraction of equal exponent, Richard Henderson, 2020/10/21
- [RFC PATCH 14/15] softfloat: Use float_cmask for addsub_floats, Richard Henderson, 2020/10/21
- Re: [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub, no-reply, 2020/10/21
- Re: [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub,
Alex Bennée <=