lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] (unsigned long long int)(float)(ULLONG_MAX-2)


From: Greg Chicares
Subject: [lmi] (unsigned long long int)(float)(ULLONG_MAX-2)
Date: Fri, 24 Mar 2017 14:25:57 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.6.0

Test case (restated from earlier messages): attempt round-trip
conversion between these types (abbreviated for ease of reference):
  ULL  meaning  unsigned long long (64 bits here)
  FLT  meaning  float (IEEE 754-2008 'binary32')
for a value near ULLONG_MAX, thus: ULL --> FLT --> ULL. Boost-1.33.1's
numeric_cast refuses (correctly, I think) to perform the second
conversion: it throws boost::numeric::positive_overflow. However, my
experimental branch's bourn_cast behaves differently (wrongly, I
think): it completes the round trip, but ends with a value of zero.

Obviously I want to know how boost decides to throw in this case.

The particular value (V) I'm using is (ULLONG_MAX - 2) = (2^64 - 3).
(A value exactly equal to ULLONG_MAX would work about as well, but I
wanted a value distinct from the highest representable ULL.)
      18446744073709551616 =  2^64      > ULLONG_MAX
      18446744073709551615 = (2^64 - 1) = ULLONG_MAX
  V = 18446744073709551613 = (2^64 - 3) = ULLONG_MAX - 2

Now
  std::cout.precision(30);
  std::cout << static_cast<float>(ULLONG_MAX - 2) << std::endl;
prints
  18446744073709551616
which is exactly 2^64, as expected. (The mantissa is exactly one; it
and the exponent are exactly representable in the binary32 format.
That format has only 24 mantissa bits (counting the implicit bit),
so V and its adjacent FLT values are:
  18446742974197923840 lower FLT neighbor
  18446744073709551613 V (not exactly representable as FLT)
  18446744073709551616 = (float)(V)
  18446746272732807168 upper FLT neighbor
so 2^64 is the closest representable FLT).

Thinking in infinite precision, we can easily see that a nonnegative
integral value can be converted to ULL iff it is <= ULLONG_MAX:
      18446744073709551616 =  2^64       > ULLONG_MAX: not convertible
      18446744073709551615 = (2^64 - 1)  = ULLONG_MAX: convertible
  V = 18446744073709551613 = (2^64 - 3) <= ULLONG_MAX: convertible

On the other hand, programming in C or C++, having converted V from
ULL to FLT, how can we tell whether the result can be converted back
to ULL? Define
  ULL_V     = numeric_cast<ULL>(V)          = 18446744073709551613
  ULL_LIMIT = numeric_cast<ULL>(ULLONG_MAX) = 18446744073709551615
  FLT_V     = numeric_cast<FLT>(V)          = 18446744073709551616
  FLT_LIMIT = numeric_cast<FLT>(ULLONG_MAX) = 18446744073709551616
We can't tell by comparing FLT_V to FLT_LIMIT: they're identical.
But that's the comparison that would be performed by this code:
  bool convertible = FLT_V <= ULLONG_MAX;
or, equivalently,
  bool inconvertible = ULLONG_MAX < FLT_V; // throw if true
because C and C++ applies the "usual arithmetic conversions".

We also can't tell by attempting to override the usual conversions by
casting to ULL:
  bool convertible = (unsigned long long)(FLT_V) <= ULLONG_MAX;
    = (unsigned long long)(18446744073709551616) <= ULLONG_MAX;
    = OULL <= ULLONG_MAX; // always true...so..."convertible"!

As Cacciola observes:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1879.htm
| properly determining whether a source value is in range for a given
| destination type is tricky because involves comparing a value of a
| type with the boundary values of a different type, task that itself
| requires a numeric conversion. This is in fact so difficult that
| boost::numeric_cast<> had it wrong for certain cases for years, even
| after a number of revisions.

but how does his numeric_cast implementation accomplish it?

To find out, I studied the source code for boost-1.33.1, because that
is the version we're using in production. (It's old, but a glance at
the repository:

https://github.com/boostorg/numeric_conversion/tree/master/include/boost/numeric/conversion

suggests that nothing essential has changed.)

Restating the question in the terms the boost implementation uses:
  S Source = float
  T Target = unsigned long long
we want to understand how
  numeric_cast<T>(FLT_V), i.e.,
  numeric_cast<unsigned long long>((float)(ULLONG_MAX - 2))
decides to throw boost::numeric::positive_overflow.

We are converting from floating to integral, so boost would use
  GetRC_Float2Int
which adjusts bounds--e.g., for ToNearest rounding:
  LT_HalfPrevLoT
    static_cast<S>(bounds<T>::lowest()) - static_cast<S>(0.5)
  GT_HalfSuccHiT
    static_cast<S>(bounds<T>::highest()) + static_cast<S>(0.5)
And it uses
  Float2IntRounder
which leads to
  Float2IntRounderBase::nearbyint()
  class Float2IntRounder = Trunc<S>
and Trunc defines nearbyint() thus:
  return s < static_cast<S>(0) ? ceil(s) : floor(s) ;
(This is distinct from std::nearbyint().)

But none of this can explain how boost decides to throw, because the
value we seek to convert is unaffected by adding or subtracting one
(or one half), or by rounding it up or down (it's already an integer):

float const f = boost::numeric_cast<float>(ULLONG_MAX - 2);
std::cout.precision(30);
std::cout
    << f << " f\n"
    << (f+1.0f) << " (f+1.0f)\n"
    << std::ceil (f+1.0f) << " std::ceil (f+1.0f)\n"
    << std::floor(f+1.0f) << " std::floor(f+1.0f)\n"
    << (f-1.0f) << " (f-1.0f)\n"
    << std::ceil (f-1.0f) << " std::ceil (f-1.0f)\n"
    << std::floor(f-1.0f) << " std::floor(f-1.0f)\n"
    << std::endl;

18446744073709551616 f
18446744073709551616 (f+1.0f)
18446744073709551616 std::ceil (f+1.0f)
18446744073709551616 std::floor(f+1.0f)
18446744073709551616 (f-1.0f)
18446744073709551616 std::ceil (f-1.0f)
18446744073709551616 std::floor(f-1.0f)

It's hard to follow the code because it uses C++98 metaprogramming,
but clearly it's necessary to dig deeper; here's what I've abstracted
from it:

There are six comparisons:

LT_LoT          return s <  S(lowest())
LE_PrevLoT      return s <= S(lowest())  - S(1.0)
LT_HalfPrevLoT  return s <  S(lowest())  - S(0.5)

GT_HiT          return s >  S(highest())
GE_SuccHiT      return s >= S(highest()) + S(1.0)
GT_HalfSuccHiT  return s >= S(highest()) + S(0.5)  ["GT" vs ">=" ?]

[Curiously, on the last line, ">=" would suggest "GE", but "GT"
is written instead; either the operation or the name must be a
mistake, but I'm not sure which.]

>From those six, one pair is chosen depending on rounding style:

ToZero     LE_PrevLoT     GE_SuccHiT
ToNearest  LT_HalfPrevLoT GT_HalfSuccHiT
ToInf      LE_PrevLoT     GT_HiT
ToNegInf   LT_LoT         GE_SuccHiT

Putting that together, here are the comparisons used for each
rounding style:

ToZero     s <= S(lowest()) - S(1.0) || s >= S(highest()) + S(1.0)
ToNearest  s <  S(lowest()) - S(0.5) || s >= S(highest()) + S(0.5)
ToInf      s <= S(lowest()) - S(1.0) || s >  S(highest())
ToNegInf   s <  S(lowest())          || s >= S(highest()) + S(1.0)

That table, extracted from the code, seems to match the text in
Cacciola's documentation (modulo inversion--they're inverses in
that the table above gives rejection criteria, while the one below
gives acceptance criteria):

http://boost.cowic.de/rc/pdf/conversion.pdf page 14:

ToZero     (s >  S(LowestT)-S(1)  ) && (s <  S(HighestT)+S(1)  )
ToNearest  (s >= S(LowestT)-S(0.5)) && (s <  S(HighestT)+S(0.5))
ToInf      (s >  S(LowestT)-S(1)  ) && (s <= S(HighestT)       )
ToNegInf   (s >= S(LowestT)       ) && (s <  S(HighestT)+S(1)  )

Could it be that boost throws only because it uses a ">=" comparison
where bourn_cast might use ">"? Recall our result from above:

  18446744073709551616 =  2^64      > ULLONG_MAX: throw
  18446744073709551615 = (2^64 - 1) = ULLONG_MAX: convert

Assuming boost uses its ToNearest rules (because that's the rounding
direction lmi sets), numeric_cast's criteria would be:

   s < S(lowest()) - S(0.5) || s >= S(highest()) + S(0.5) // throw
   s >= S(LowestT) - S(0.5) && s <  S(HighestT)  + S(0.5) // convert

We're concerned only with the high end. In numeric_cast's terminology:
  S Source = float
  T Target = unsigned long long
so numeric_cast throws, for a float argument 's', if:

   s >= S(highest T) + S(0.5)
   s >= float(ULLONG_MAX) + float(0.5)

whereas the criterion in real numbers is
   s > ULLONG_MAX
which uses ">" instead of ">=". Yet numeric_cast does the right thing
for this example's argument, 18446744073709551616:

real:   18446744073709551616 >  18446744073709551615 = ULLONG_MAX
float:  18446744073709551616 >= (float)18446744073709551616.5f

because 18446744073709551616.5f is not representable and the fraction
is rounded off, leaving 18446744073709551616 on the RHS, which is
identical to the LHS. Thus, numeric_cast adds one-half (which makes no
difference), and uses ">=" (which seems wrong)...and correctly decides
to throw.

I don't see how this can be right, so let's try to prove that it's
wrong. My hypothesis is that ">=" has been incorrectly used where
">" would be correct--recall the observation above
  GT_HalfSuccHiT  return s >= S(highest()) + S(0.5)  ["GT" vs ">=" ?]
where "GT" is consistent with this hypothesis--and if so, I should be
able to find a case where numeric_cast does the wrong thing. A likely
way to accomplish this is by testing with ToInf and again with any
other rounding direction, in order to get both ">=" and ">" criteria
for throwing, which should behave differently.

At first, I thought that perhaps the rounding direction was determined
at runtime, e.g. by calling fegetround(), but that's not the case:
instead, it is a policy, specified as a template parameter. Apparently
there is no easy way to change this policy:

http://boost.2283326.n4.nabble.com/numeric-cast-td3602499.html
| As written there is no way for a user to specify a different
| overflow handler, rounding policy, or range checker. The defaults
| are always used when numeric_cast is called.

But there's another way to test the ">=" vs ">" hypothesis: convert
to long double instead of float, because an 80-bit long double has
64 mantissa bits (counting the implied bit). However:

    std::cout << "Now test (ULL)(long double)(ULLONG_MAX) instead\n";
    std::cout.precision(30);
    unsigned long long v = ULLONG_MAX; // No "-2".
    long double const d = boost::numeric_cast<long double>(v);
    unsigned long long u = boost::numeric_cast<unsigned long long>(d);
    std::cout
        << v << " v\n"
        << d << " d\n"
        << u << " u\n"
        << std::endl;

Now test (ULL)(long double)(ULLONG_MAX) instead
18446744073709551615 v
18446744073709551615 d
18446744073709551615 u

That seems to disprove my hypothesis. If boost were improperly using
a ">=" comparison as a rejection criterion, then "d >= ULLONG_MAX"
would be true and boost::numeric_cast would (erroneously) throw. But
instead of throwing, the cast (correctly) succeeds. Why?

New hypothesis: because of the bound adjustments.

Looking into the source again, for the Trunc policy which seems to be
the applicable default (above), we have std::round_toward_zero as the
default rounding style. Then, from the table above, the criterion for
rejection (throwing) is:

ToZero     s <= S(lowest()) - S(1.0) || s >= S(highest()) + S(1.0)

Now we are using
  S Source = long double (not float as above)
  T Target = unsigned long long (same as above)
so numeric_cast throws iff
  value >= (long double)(ULLONG_MAX) + (long double)(1ULL)
Above, in the real number system, I insist that
  value > ULLONG_MAX
is the correct criterion, but I would also agree that
  value >= ULLONG_MAX + 1
is correct for integers of unlimited precision. It's clear that this
works for long double, which has all the integer precision needed for
this comparison. But why does numeric_cast work for float? and can we
prove that it works in general--i.e., that ">=" in the floating domain
behaves like ">" in the real domain in exactly the cases where adding
one doesn't change the value...so that two ostensible mistakes always
cancel each other?

At this point, I was thinking of casting both values to long double,
which is precise enough for any comparison between lower-ranked types,
and running bourn_cast with that modification in parallel with boost's
implementation in the hope of finding an example where they disagree.
(Casting to long double is more precise than using std::isgreater().
Alternatively, any arithmetic values can be compared correctly by
working at the bit level, but that seems tedious and inelegant.)

However:

"If you can't solve a problem, then there is an easier problem you
can't solve: find it." [attributed to Polya by Conway]

We don't need a general technique to compare any floating value to any
integral value; we only want to know whether a floating value exceeds
some member of the INT_MAX family, knowing that every member is 2^n-1
for some integer n. Evidently Cacciola has replaced that problem with
an easier one, transforming the ill-conditioned
  X > 2^n - 1
which may require more mantissa bits than X has, into
  X >= 2^n
where the right-hand side is exactly representable in normalized
binary32 format for n as high as 126. (The maximum value of n doesn't
even matter, because we can easily tell if a hypothetical INT128_MAX
exceeds FLT_MAX.) I wonder whether he realizes this: perhaps not, as
his ToInf policy doesn't adjust the upper bound and would therefore
seem likely to give the wrong answer.

"Can you see it at a glance?" --Polya, "Can You Solve It?"

Using the original example above:

  float const f = boost::numeric_cast<float>(ULLONG_MAX - 2);

range of type float:  0x000000 - 0xfffff times 2^(+/-126 or so)
value of ULLONG_MAX:  0xffffffffffffffff

it may seem that we are trying to compare these two values:

  0xffffff.ffffffffff * 2^40
  0xffffff.fffffffffd * 2^40

but nothing really exists to the right of the binary point. The
actual value 'f' that we have is 0x1 * 2^64, and the comparison:

           f  <  ULLONG_MAX
  0x1 * 2^64  <  0xffffffffffffffff

is hard (float lacks the precision; unsigned long long lacks the
range), but we can easily solve the equivalent problem:

           f  <=  ULLONG_MAX + 1
        2^64  <=  2^64

The real reason for adding one isn't to adjust for the rounding
direction: it's to turn integer maxima into powers of two.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]