lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Transcendentals faster on linux than msw (wine)?


From: Greg Chicares
Subject: Re: [lmi] Transcendentals faster on linux than msw (wine)?
Date: Tue, 6 Oct 2020 23:25:14 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0

On 2020-10-06 15:24, Vadim Zeitlin wrote:
> On Tue, 6 Oct 2020 14:16:19 +0000 Greg Chicares <gchicares@sbcglobal.net> 
> wrote:
> 
> GC> Vadim--Consider lmi's 'i_from_i_upper_n_over_n, implemented thus:
> GC>     // naively:    (1+i)^n - 1
> GC>     // substitute: (1+i)^n - 1 <-> std::expm1(std::log1p(i) * n)
> GC>     long double z = std::expm1l(std::log1pl(i) * n);
> GC> That seems to run much faster on GNU/Linux than on msw as "emulated"
> GC> by 'wine'. But how can that be, since this algorithm should just map
> GC> onto a small number of machine instructions and msw or 'wine' shouldn't
> GC> matter at all?
> 
>  My only idea is that the implementation of mathematical functions in glibc
> is different, and better, than their implementation in MinGW.

I've just pushed a new branch "odd/glibc-expm1-log1p" that continues
the investigation.

>  Looking at this, one thing I can't help noticing is that double precision
> results are exactly the same across all 3 builds (I just put the cursor
> over them and press "*" in Vim to see it: all 3 matches are highlighted),
> while the production method gives different results under MSW and Linux.
> IMO this is a strong argument in favour of replacing the production method
> with something else, getting the same results under all platforms is highly
> desirable (again, IMO, of course).

Yes, particularly since the "production" method uses "long double",
which we shouldn't use for 64-bit builds without a strong reason.

>  But otherwise I can't say anything, sorry. The results of the speed tests
> do look strange, but I just don't know why. Looking at the disassembly
> could at least verify if the code is the same or different, but in either
> case I'm not sure what could we actually do about it. I guess if glibc
> version[*] is really better (and it does seem to have a few non-trivial
> optimizations), we could just use it instead of the standard one?

The last commit on "odd/glibc-expm1-log1p" suggests that the glibc
version is faster by a factor that's somewhat hard to believe.

I think I'll try building lmi with and without that glibc code,
and measure the clock time for the CLI '--self_test'. This 'perf'
report:

https://lists.nongnu.org/archive/html/lmi/2020-10/msg00016.html
| $sudo LD_LIBRARY_PATH=./usr/lib/x86_64-linux-gnu/ ./usr/bin/perf_4.19 report
|
| Samples: 24K of event 'cycles:ppp', Event count (approx.): 19181550079
| Overhead  Command         Shared Object        Symbol
|    8.60%  lmi_cli_shared  libm-2.31.so         [.] expm1f64x
|    6.24%  lmi_cli_shared  liblmi.so            [.]
| AccountValue::DecrementAVProportionally
|    4.83%  lmi_cli_shared  liblmi.so            [.] 
Irc7702A::DetermineLowestBft
|    4.46%  lmi_cli_shared  liblmi.so            [.] AccountValue::TxSetDeathBft
|    2.84%  lmi_cli_shared  liblmi.so            [.] AccountValue::SurrChg

suggests that lmi spends one-twelfth of its time in expm1()
(and log1p() took enough additional time to bring their total
to about ten percent)--and that's for pc-linux-gnu, which
already uses glibc. But I'm not sure I'm interpreting perf's
report correctly--does the output above really demonstrate
conclusively that such a large part of the '--self_test'
time is spend in expm1f64x? Or could it be detecting some
function that calls expm1, and (mis)attributing its total
time to expm1?


reply via email to

[Prev in Thread] Current Thread [Next in Thread]