[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] MinGW-w64 anomaly?
From: |
Greg Chicares |
Subject: |
Re: [lmi] MinGW-w64 anomaly? |
Date: |
Sun, 18 Dec 2016 22:18:07 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.4.0 |
On 2016-12-18 20:54, Vadim Zeitlin wrote:
> On Sun, 18 Dec 2016 14:28:46 +0000 Greg Chicares <address@hidden> wrote:
>
> GC> Vadim--Could you please take a look at this? It's an extremely simplified
> GC> test case, but it's important for the way lmi performs rounding.
> GC>
> GC> Apply the patch below [0] and run:
> GC> make unit_tests unit_test_targets=sandbox_test.exe
> GC>
> GC> Cross-compiling for msw on debian with MinGW-w64 gcc-4.9.1, I see:
>
> [Let me start with a digression not affecting lmi directly, so please feel
> free to skip it completely]
>
> Interestingly enough, this discrepancy doesn't appear if -std=c++11 option
> is not used: I started testing this outside of lmi initially and saw the
> same results in all 4 cases. I had to add -std=c++11 to see the same thing
> as in lmi unit test. This was surprising, so I decided to check why was it
> the case and it turned out that <cmath> has a std::pow() overload for
> "(long double, int)" using __builtin_powil() in pre-C++11 mode, but uses
> __builtin_powl() for all types of arguments for long double in C++11. And
> according to http://en.cppreference.com/w/cpp/numeric/math/pow this is
> correct, however I think it might be one of a few cases in which this
> reference is wrong. It's quite understandable in this particular case
> because it seems that the standard went through several iterations and, at
> some point, these functions were indeed removed (see e.g. the proposal at
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3286.html#550).
> But the answer at http://stackoverflow.com/a/5627278/15275, written by
> Howard Hinnant himself, says that this overload finally hasn't been
> removed, so MinGW-w64 shouldn't have this "#if __cplusplus < 201103L" check
> in its <cmath> -- even though it does in the version we're using and still
> seems to be the case in the very latest version of libstdc++, see
> https://github.com/gcc-mirror/gcc/blob/6514c52372bb36e555843b0419d71cf55eda0409/libstdc++-v3/include/c_global/cmath#L395
> So I opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78851 to see if it
> could/should be fixed.
>
> [End of the digression]
Thank you very much. Then the anomaly really is upstream, in libstdc++.
At first reading, I was saddened by this:
http://stackoverflow.com/a/5627278/15275
| So while the freedom is there to shuffle pow(double, int) off to a
| separate algorithm, most implementors I'm aware of have given up on
| that strategy
until I realized that his example really showed this expression
pow(.1, 20)
being "optimized" as
0.1 * 0.1 * 0.1 ...
which does harm accuracy. However, if we rewrite the expression as
the presumably intended
pow(10.0L, -20)
and optimize it as
1.0L / (10.0L * 10.0L * 10.0L ...)
then accuracy improves.
IOW, instead of an optimized overload for
long double std::pow(long double, int)
I'm in effect looking for one like this:
long double std::pow(int, int)
although of course C++ doesn't overload on return type.
> GC> Running sandbox_test:
> GC> f2():
> GC> 0.0000000001000000000000000000018377930496 1/10^10
> GC> 0.0000000001000000000000000000397031165002 10^-10
>
> Just for the reference, here are the binary (i.e. exact) representations
> of these numbers in 128 bit IEEE format:
>
> 1.b7cdfd9d7bdba b7e0000000005b1p-34
> 1.b7cdfd9d7bdba b8a0000000002b5p-34
>
> where I added a space after the first 52 bits which is all we have in the
> in-memory double representation. This at least confirms that both values
> are rounded to the same 1.b7cdfd9d7bdbbp-34 when using 64 bit doubles.
Yet I really do want long doubles.
> GC> f3():
> GC> 0.0000000001000000000000000000018377930496 1/10^10
> GC> 0.0000000001000000000000000000018377930496 10^-10
> GC>
> GC> What concerns me is the second of those four numbers: it's less accurate
> GC> than the other three. If an 80-byte register is being spilled to 64-byte
> GC> storage, I'd expect the result to be less accurate than this. I just
> GC> can't see any way to explain it.
>
> FWIW I could confirm, using my recently found knowledge of
> -fexcess-precision option, that this is not related to spilling x87
> registers to memory. If I convert your example to the sensibly equivalent C
> version:
[...]
> and compile it with -fexcess-precision=standard (and still with -O2, of
> course), it doesn't change anything and I still get the same results.
>
> But in the C version it's very simple to see what goes on however: f3()
> basically just prints the constant value, while f2() calls _powl()
> function. So what you see is basically that the results of compile-time
> computation are not exactly the same as the results of run-time
> computation. And I can even explain this further: I'm pretty sure that
> compile-time results are computed using the same __builtin_powil() that is
> used in C++98 code and it's not really surprising that its much simpler
> algorithm using just multiplications and inversion gives different results
> from the general-purpose powl(). Notice that __builtin_powil() is, of
> course, fully inlined and you get efficiently computed x^-10 by
> successively computing, using the single multiplication instruction, x,
> x^2, x^3, x^5, x^10 and then inverting it using just one other instruction,
> so it's clearly not something powl() can do -- it's already nice that it
> gives the same result as compile-time calculation for x^10 itself.
>
> BTW, if you add -fwhole-program (or -flto, but I can't examine its results
> as easily as the assembler code) option to gcc command line, then f2()
> would be reduced to the same trivial code (and both functions will be
> inlined directly into main() too).
This discussion leads me to think that instead of hoping for std::pow()
to do exactly what I want, I might try [untested sketch]:
long double lmi_powli(long double base, int iexp)
{
if(i < 0)
return 1.0L / lmi_powli(base, -iexp);
else
return std::pow(base, iexp);
}
Then, when 'base' has an exact integer value, a standard-conforming
implementation is at least highly likely to do what I want.
> GC> I can work around this by taking the reciprocal of the product of N tens,
> GC> which seems always to be as accurate as the best case with std::pow(),
> GC> but I'd rather avoid that.
It's probably time to reveal my motivation, which is to modernize and
simplify 'round_to.hpp'. I didn't want to state that earlier because
that file will soon become smaller and much clearer.
> I'm not sure what exactly is worth working around here. Both results are
> correct for the precision of standard double type and you would get the
> same output if you stored them in a "volatile double". But if you really
> want to get best performance and identical output in both cases, you could
> use __builtin_powil() manually in f2().
I think the lmi_powli() idea above accomplishes *almost* the same thing
without relying on gcc internals (so it's more portable across compilers
and across versions of gcc). I say *almost* because I think lmi_powli()
as sketched above depends on the FP rounding direction. I could of course
set the rounding direction explicitly, though that may be costly. But I
could build a cache at compile time, which might implicitly call the
builtin.
The motivation is to round a double to 'iexp' decimals, with various
rounding directions, e.g.:
floor(r * scale_fwd_) * scale_back_
trunc(r * scale_fwd_) * scale_back_
ceil (r * scale_fwd_) * scale_back_
where 'scale_fwd_' and 'scale_back_' are reciprocals that are both
long double powers of ten. It's similar to this idea:
http://florian.loitsch.com/publications/dtoa-pldi2010.pdf?attredirects=0
| Grisu needs a cache of precomputed powers-of-ten. The cache must be
| precomputed using high-precision integer arithmetic.
except that I'm trying to use 'long double' as my higher-precision
arithmetic.
- [lmi] MinGW-w64 anomaly?, Greg Chicares, 2016/12/18
- Re: [lmi] MinGW-w64 anomaly?, Vadim Zeitlin, 2016/12/18
- Re: [lmi] MinGW-w64 anomaly?,
Greg Chicares <=
- Re: [lmi] MinGW-w64 anomaly?, Vadim Zeitlin, 2016/12/18
- Re: [lmi] MinGW-w64 anomaly?, Greg Chicares, 2016/12/19
- Re: [lmi] MinGW-w64 anomaly?, Vadim Zeitlin, 2016/12/20
- Re: [lmi] MinGW-w64 anomaly?, Greg Chicares, 2016/12/20
- Re: [lmi] MinGW-w64 anomaly?, Vadim Zeitlin, 2016/12/21
- Re: [lmi] MinGW-w64 anomaly?, Greg Chicares, 2016/12/21
- Re: [lmi] MinGW-w64 anomaly?, Vadim Zeitlin, 2016/12/21
- Re: [lmi] MinGW-w64 anomaly?, Greg Chicares, 2016/12/21
- Re: [lmi] MinGW-w64 anomaly?, Vadim Zeitlin, 2016/12/21
- Re: [lmi] MinGW-w64 anomaly?, Greg Chicares, 2016/12/21