[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Transcendentals faster on linux than msw (wine)?
From: |
Vadim Zeitlin |
Subject: |
Re: [lmi] Transcendentals faster on linux than msw (wine)? |
Date: |
Tue, 6 Oct 2020 17:24:05 +0200 |
On Tue, 6 Oct 2020 14:16:19 +0000 Greg Chicares <gchicares@sbcglobal.net> wrote:
GC> Vadim--Consider lmi's 'i_from_i_upper_n_over_n, implemented thus:
GC> // naively: (1+i)^n - 1
GC> // substitute: (1+i)^n - 1 <-> std::expm1(std::log1p(i) * n)
GC> long double z = std::expm1l(std::log1pl(i) * n);
GC> That seems to run much faster on GNU/Linux than on msw as "emulated"
GC> by 'wine'. But how can that be, since this algorithm should just map
GC> onto a small number of machine instructions and msw or 'wine' shouldn't
GC> matter at all?
My only idea is that the implementation of mathematical functions in glibc
is different, and better, than their implementation in MinGW. This is not
impossible, at least we know that the two use different implementations
because MinGW can't reuse glibc ones for some (licensing?) reasons: I don't
remember the details, but we had a bug with rounding in MinGW that could be
trivially fixed by taking glibc code, but the maintainers refused to do it.
Of course, it is still surprising -- but this seems more likely than Wine
being to blame for the slow down.
BTW, I'm also surprised that 64 bit version is slightly slower than 32 bit
one. I have no idea at all about how to explain this one...
GC> Conversely, the unit test's 'i_upper_n_over_n_from_i_naive':
GC> T operator()(T const& i) const
GC> {return T(-1) + std::pow((T(1) + i), T(1) / n);}
GC> using pow() seems three times as fast for both i686-w64-mingw32 and
GC> x86_64-w64-mingw32 than for x86_64-pc-linux-gnu.
In principle, this could be explained by glibc having "better QoI" (e.g.
better error checking) but slower implementation than MinGW too. I have no
idea if this is really true or not, of course.
GC> Raw data: i686-w64-mingw32-gcc-8.3-win32
GC>
GC> Speed tests:
GC> std::pow 1.463e-06 s mean; 1 us least of 6836 runs
GC> std::expm1 1.239e-06 s mean; 1 us least of 8073 runs
GC> double i365 7.609e-07 s mean; 1 us least of 13144 runs
GC> long double i365 7.520e-07 s mean; 1 us least of 13299 runs
GC>
GC> Daily rate corresponding to 1% annual interest, by various methods:
GC> 000000000111111111122
GC> 123456789012345678901
GC> 0.0000272615520089941669031 method in production
GC> 0.0000272615520089941669031 long double precision, std::expm1 and
std::log1p
GC> 0.0000272615520089941739887 long double precision, std::pow
GC> 0.0000272615520089941672124 double precision, std::expm1 and std::log1p
GC> 0.0000272615520089392049385 double precision, std::pow
GC>
GC> x86_64-w64-mingw32-gcc-8.3 raw data:
GC>
GC> Speed tests:
GC> std::pow 1.602e-06 s mean; 2 us least of 6245 runs
GC> std::expm1 1.304e-06 s mean; 1 us least of 7667 runs
GC> double i365 7.516e-07 s mean; 1 us least of 13306 runs
GC> long double i365 7.630e-07 s mean; 1 us least of 13108 runs
GC>
GC> Daily rate corresponding to 1% annual interest, by various methods:
GC> 000000000111111111122
GC> 123456789012345678901
GC> 0.0000272615520089941669031 method in production
GC> 0.0000272615520089941672124 double precision, std::expm1 and std::log1p
GC> 0.0000272615520089392049385 double precision, std::pow
GC>
GC> pc-linux-gnu gcc-9 raw data:
GC>
GC> Speed tests:
GC> std::pow 4.565e-06 s mean; 2 us least of 2191 runs
GC> std::expm1 8.252e-07 s mean; 0 us least of 12119 runs
GC> double i365 7.233e-08 s mean; 0 us least of 138246 runs
GC> long double i365 1.721e-07 s mean; 0 us least of 58098 runs
GC>
GC> Daily rate corresponding to 1% annual interest, by various methods:
GC> 000000000111111111122
GC> 123456789012345678901
GC> 0.0000272615520089941669014 method in production
GC> 0.0000272615520089941672124 double precision, std::expm1 and std::log1p
GC> 0.0000272615520089392049385 double precision, std::pow
Looking at this, one thing I can't help noticing is that double precision
results are exactly the same across all 3 builds (I just put the cursor
over them and press "*" in Vim to see it: all 3 matches are highlighted),
while the production method gives different results under MSW and Linux.
IMO this is a strong argument in favour of replacing the production method
with something else, getting the same results under all platforms is highly
desirable (again, IMO, of course).
But otherwise I can't say anything, sorry. The results of the speed tests
do look strange, but I just don't know why. Looking at the disassembly
could at least verify if the code is the same or different, but in either
case I'm not sure what could we actually do about it. I guess if glibc
version[*] is really better (and it does seem to have a few non-trivial
optimizations), we could just use it instead of the standard one?
Regards,
VZ
[*]
https://github.com/bminor/glibc/blob/43b1048ab9418e902aac8c834a7a9a88c501620a/sysdeps/ieee754/dbl-64/s_expm1.c
pgp1NwXjC3OkG.pgp
Description: PGP signature