[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] softfloat: Fix division
From: |
Alex Bennée |
Subject: |
Re: [Qemu-devel] [PATCH] softfloat: Fix division |
Date: |
Wed, 03 Oct 2018 10:28:31 +0100 |
User-agent: |
mu4e 1.1.0; emacs 26.1.50 |
Richard Henderson <address@hidden> writes:
> The __udiv_qrnnd primitive that we nicked from gmp requires its
> inputs to be normalized. We were not doing that. Because the
> inputs are nearly normalized already, finishing that is trivial.
> Inline div128to64 into its one user, because it is no longer a
> general-purpose routine.
>
> Fixes: cf07323d494
> Fixes: https://bugs.launchpad.net/qemu/+bug/1793119
> Signed-off-by: Richard Henderson <address@hidden>
> ---
> include/fpu/softfloat-macros.h | 48 -----------------------
> fpu/softfloat.c | 72 ++++++++++++++++++++++++++++++----
> 2 files changed, 64 insertions(+), 56 deletions(-)
>
> diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
> index 35e1603a5e..f29426ac46 100644
> --- a/include/fpu/softfloat-macros.h
> +++ b/include/fpu/softfloat-macros.h
> @@ -625,54 +625,6 @@ static inline uint64_t estimateDiv128To64(uint64_t a0,
> uint64_t a1, uint64_t b)
>
> }
>
> -/* From the GNU Multi Precision Library - longlong.h __udiv_qrnnd
> - * (https://gmplib.org/repo/gmp/file/tip/longlong.h)
> - *
> - * Licensed under the GPLv2/LGPLv3
> - */
> -static inline uint64_t div128To64(uint64_t n0, uint64_t n1, uint64_t d)
> -{
> - uint64_t d0, d1, q0, q1, r1, r0, m;
> -
> - d0 = (uint32_t)d;
> - d1 = d >> 32;
> -
> - r1 = n1 % d1;
> - q1 = n1 / d1;
> - m = q1 * d0;
> - r1 = (r1 << 32) | (n0 >> 32);
> - if (r1 < m) {
> - q1 -= 1;
> - r1 += d;
> - if (r1 >= d) {
> - if (r1 < m) {
> - q1 -= 1;
> - r1 += d;
> - }
> - }
> - }
> - r1 -= m;
> -
> - r0 = r1 % d1;
> - q0 = r1 / d1;
> - m = q0 * d0;
> - r0 = (r0 << 32) | (uint32_t)n0;
> - if (r0 < m) {
> - q0 -= 1;
> - r0 += d;
> - if (r0 >= d) {
> - if (r0 < m) {
> - q0 -= 1;
> - r0 += d;
> - }
> - }
> - }
> - r0 -= m;
> -
> - /* Return remainder in LSB */
> - return (q1 << 32) | q0 | (r0 != 0);
> -}
> -
>
> /*----------------------------------------------------------------------------
> | Returns an approximation to the square root of the 32-bit significand given
> | by `a'. Considered as an integer, `a' must be at least 2^31. If bit 0 of
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 9405f12a03..93edc06996 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -1112,19 +1112,75 @@ static FloatParts div_floats(FloatParts a, FloatParts
> b, float_status *s)
> bool sign = a.sign ^ b.sign;
>
> if (a.cls == float_class_normal && b.cls == float_class_normal) {
> - uint64_t temp_lo, temp_hi;
> + uint64_t n0, n1, d0, d1, q0, q1, r1, r0, m, d;
> int exp = a.exp - b.exp;
> +
> + /*
> + * We want the 2*N / N-bit division to produce exactly an N-bit
> + * result, so that we do not have to renormalize afterward.
> + * If A < B, then division would produce an (N-1)-bit result;
> + * shift A left by one to produce the an N-bit result, and
> + * decrement the exponent to match.
> + *
> + * The udiv_qrnnd algorithm that we're using requires normalization,
> + * i.e. the msb of the denominator must be set. Since we know that
> + * DECOMPOSED_BINARY_POINT is msb-1, everything must be shifted left
> + * by one more.
> + */
> if (a.frac < b.frac) {
> exp -= 1;
> - shortShift128Left(0, a.frac, DECOMPOSED_BINARY_POINT + 1,
> - &temp_hi, &temp_lo);
> + shortShift128Left(0, a.frac, DECOMPOSED_BINARY_POINT + 2, &n1,
> &n0);
> } else {
> - shortShift128Left(0, a.frac, DECOMPOSED_BINARY_POINT,
> - &temp_hi, &temp_lo);
> + shortShift128Left(0, a.frac, DECOMPOSED_BINARY_POINT + 1, &n1,
> &n0);
> }
> - /* LSB of quot is set if inexact which roundandpack will use
> - * to set flags. Yet again we re-use a for the result */
> - a.frac = div128To64(temp_lo, temp_hi, b.frac);
> + d = b.frac << 1;
> +
> + /*
> + * From the GNU Multi Precision Library - longlong.h __udiv_qrnnd
> + * (https://gmplib.org/repo/gmp/file/tip/longlong.h)
> + * Licensed under the GPLv2/LGPLv3.
> + */
> + d0 = (uint32_t)d;
> + d1 = d >> 32;
> +
> + r1 = n1 % d1;
> + q1 = n1 / d1;
> + m = q1 * d0;
> + r1 = (r1 << 32) | (n0 >> 32);
> + if (r1 < m) {
> + q1 -= 1;
> + r1 += d;
> + if (r1 >= d) {
> + if (r1 < m) {
> + q1 -= 1;
> + r1 += d;
> + }
> + }
> + }
> + r1 -= m;
> +
> + r0 = r1 % d1;
> + q0 = r1 / d1;
> + m = q0 * d0;
> + r0 = (r0 << 32) | (uint32_t)n0;
> + if (r0 < m) {
> + q0 -= 1;
> + r0 += d;
> + if (r0 >= d) {
> + if (r0 < m) {
> + q0 -= 1;
> + r0 += d;
> + }
> + }
> + }
> + r0 -= m;
> + /* End of __udiv_qrnnd. */
> +
> + /* Adjust remainder for normalization step. */
> + r0 >>= 1;
> +
> + /* Set lsb if there is a remainder, to set inexact. */
> + a.frac = (q1 << 32) | q0 | (r0 != 0);
> a.sign = sign;
> a.exp = exp;
> return a;
I guess if we get to the 80 bit stuff this will need to be commonised
again?
Anyway:
Reviewed-by: Alex Bennée <address@hidden>
Tested-by: Alex Bennée <address@hidden>
With Emilio's fp-test the only non-special ext80 failures left seem to be
some flag setting in various flavours of mulAdd. For example:
10:27:23 address@hidden:~/l/q/q/t/fp] testing/generic-op-tester 1 + ./fp-test
f16_mulAdd f32_mulAdd f64_mulAdd
>> Testing f16_mulAdd, rounding near_even, tininess before rounding
6133248 tests total.
Errors found in f16_mulAdd, rounding near_even, tininess before rounding:
+00.000 +1F.000 +1F.3DE => +1F.3DE ..... expected -1F.3FF v....
+00.000 +1F.000 +1F.3FF => +1F.3FF ..... expected -1F.3FF v....
+00.000 +1F.000 +1F.3FE => +1F.3FE ..... expected -1F.3FF v....
+00.000 +1F.000 -1F.3FF => -1F.3FF ..... expected -1F.3FF v....
+00.000 +1F.000 -1F.3FE => -1F.3FE ..... expected -1F.3FF v....
--
Alex Bennée