[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] A use case for long double
From: |
Vadim Zeitlin |
Subject: |
Re: [lmi] A use case for long double |
Date: |
Sun, 1 May 2022 03:36:10 +0200 |
On Sat, 30 Apr 2022 17:46:05 +0000 Greg Chicares <gchicares@sbcglobal.net>
wrote:
GC> In preparation for migrating lmi releases from 32- to 64-bit binaries,
GC> I've been reconsidering lmi's use of type 'long double'. I postulate
GC> that 'long double' should not be used in place of 'double' without a
GC> convincing rationale, because it's less common in practice and because
GC> it's presumably slower for x86_64.
FWIW, this is exactly what I thought too...
GC> I had anticipated that IRR calculations would be faster, though
GC> somewhat (but perhaps tolerably) less accurate using binary64.
GC> However, see:
GC>
https://git.savannah.nongnu.org/cgit/lmi.git/commit/?h=odd/eraseme_long_double_irr
GC> It looks like we should keep the existing binary80 IRR code, because
GC> it's no slower, and achieves an extra two digits of precision in a
GC> not-implausible test case.
GC>
GC> The apparent lack of a speed penalty came as a surprise to me,
GC> but we follow the evidence wherever it may lead.
Yes, but it would be really nice to understand how is this possible: I
just don't understand how could the legacy x87 part of the hardware be
faster (even in absolute terms, not just "per bit of precision") than the
much more recent SSE instructions. I wonder if the results could be better
if we use some more aggressive code generation options, e.g. if you could
perhaps rerun the benchmarks with -march=native compilation option? This
seems unlikely, but maybe the compiler generates some very suboptimal SSE
code because it keeps compatibility with some very old micro-architectures
by default?
GC> It would be conceivable to do IRR calculations using expm1() and
GC> log1p(), but that doesn't seem attractive. The principal part of
GC> the calculation is evaluation of NPV, the inner product of a stream
GC> of values ("cash flows") and a vector of powers (1+i)^n, n=0,1,2...
Shouldn't this calculation be vectorizable then? I'm sorry, I didn't look
at the details yet, but if there is any chance of being able to vectorize a
loop, it would be worth doing it as this could result in really spectacular
gains.
In any case, thanks for sharing the results of your analysis, they're not
what I expected (which is, actually, exactly what I should have expected
knowing that benchmarks always give surprising results), but still good to
know.
VZ
pgpJM5TM8rNyy.pgp
Description: PGP signature