[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Transcendentals faster on linux than msw (wine)?
From: |
Greg Chicares |
Subject: |
Re: [lmi] Transcendentals faster on linux than msw (wine)? |
Date: |
Tue, 6 Oct 2020 23:25:14 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 |
On 2020-10-06 15:24, Vadim Zeitlin wrote:
> On Tue, 6 Oct 2020 14:16:19 +0000 Greg Chicares <gchicares@sbcglobal.net>
> wrote:
>
> GC> Vadim--Consider lmi's 'i_from_i_upper_n_over_n, implemented thus:
> GC> // naively: (1+i)^n - 1
> GC> // substitute: (1+i)^n - 1 <-> std::expm1(std::log1p(i) * n)
> GC> long double z = std::expm1l(std::log1pl(i) * n);
> GC> That seems to run much faster on GNU/Linux than on msw as "emulated"
> GC> by 'wine'. But how can that be, since this algorithm should just map
> GC> onto a small number of machine instructions and msw or 'wine' shouldn't
> GC> matter at all?
>
> My only idea is that the implementation of mathematical functions in glibc
> is different, and better, than their implementation in MinGW.
I've just pushed a new branch "odd/glibc-expm1-log1p" that continues
the investigation.
> Looking at this, one thing I can't help noticing is that double precision
> results are exactly the same across all 3 builds (I just put the cursor
> over them and press "*" in Vim to see it: all 3 matches are highlighted),
> while the production method gives different results under MSW and Linux.
> IMO this is a strong argument in favour of replacing the production method
> with something else, getting the same results under all platforms is highly
> desirable (again, IMO, of course).
Yes, particularly since the "production" method uses "long double",
which we shouldn't use for 64-bit builds without a strong reason.
> But otherwise I can't say anything, sorry. The results of the speed tests
> do look strange, but I just don't know why. Looking at the disassembly
> could at least verify if the code is the same or different, but in either
> case I'm not sure what could we actually do about it. I guess if glibc
> version[*] is really better (and it does seem to have a few non-trivial
> optimizations), we could just use it instead of the standard one?
The last commit on "odd/glibc-expm1-log1p" suggests that the glibc
version is faster by a factor that's somewhat hard to believe.
I think I'll try building lmi with and without that glibc code,
and measure the clock time for the CLI '--self_test'. This 'perf'
report:
https://lists.nongnu.org/archive/html/lmi/2020-10/msg00016.html
| $sudo LD_LIBRARY_PATH=./usr/lib/x86_64-linux-gnu/ ./usr/bin/perf_4.19 report
|
| Samples: 24K of event 'cycles:ppp', Event count (approx.): 19181550079
| Overhead Command Shared Object Symbol
| 8.60% lmi_cli_shared libm-2.31.so [.] expm1f64x
| 6.24% lmi_cli_shared liblmi.so [.]
| AccountValue::DecrementAVProportionally
| 4.83% lmi_cli_shared liblmi.so [.]
Irc7702A::DetermineLowestBft
| 4.46% lmi_cli_shared liblmi.so [.] AccountValue::TxSetDeathBft
| 2.84% lmi_cli_shared liblmi.so [.] AccountValue::SurrChg
suggests that lmi spends one-twelfth of its time in expm1()
(and log1p() took enough additional time to bring their total
to about ten percent)--and that's for pc-linux-gnu, which
already uses glibc. But I'm not sure I'm interpreting perf's
report correctly--does the output above really demonstrate
conclusively that such a large part of the '--self_test'
time is spend in expm1f64x? Or could it be detecting some
function that calls expm1, and (mis)attributing its total
time to expm1?