[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Contemplating the infinite
From: |
Greg Chicares |
Subject: |
Re: [lmi] Contemplating the infinite |
Date: |
Sat, 26 Jun 2010 01:45:59 +0000 |
User-agent: |
Thunderbird 2.0.0.24 (Windows/20100228) |
On 2010-06-25 08:29Z, Vadim Zeitlin wrote:
> On Fri, 25 Jun 2010 01:26:49 +0000 Greg Chicares <address@hidden> wrote:
>
> GC> Other occurrences of DBL_MAX and std::numeric_limits<double>::max() (which
> GC> means the same thing) remain in HEAD. I'm not going to consider changing
> GC> them now if they aren't directly related to xml, product files, or the
> GUI,
> GC> particularly in light of this article
> GC> http://www.cygnus-software.com/papers/x86andinfinity.html
> GC> that cautions about performance problems.
>
> This is definitely an interesting link, I had no idea there could be such
> a huge slowdown when operating with infinities or NANs on x87. However the
> article also says that SSE unit doesn't suffer from this penalty at all so
> this could be another reason to switch to using it as discussed before.
That paper focuses on intel P4, as does this one:
http://web.archive.org/web/20080225034838/http://www.sph.sc.edu/comd/rorden/simd.html
| processing time required to multiply ... 1x1 [versus] 1xNAN
| With an AMD Athlon, there is no difference
| The Pentium 3 is 14 times slower for NaN calcuations
| the Pentium 4 is 135 times slower
This message suggests that the P4 penalty might have been an aberration:
http://www.mathworks.it/matlabcentral/newsreader/view_thread/156537
| I think (but can't find reference to) the newer Core Architecture
| (Core 2 Duos and such) performs better here. AMD processors CERTAINLY
| perform better with these.
...and this seems to say Core2 is much *faster* with NaNs:
https://www-old.cae.wisc.edu/pipermail/help-octave/2008-June/009513.html
| > Pentium 4 (Family 15, Model 2), Octave 3.0.1 MSVC2005 SSE2
| > octave-3.0.1.exe:2> a=zeros(300,300);tic;b=(1.0+a)*a;toc
| > Elapsed time is 0.0257161 seconds.
| > octave-3.0.1.exe:3> a=zeros(300,300);tic;b=(NaN+a)*a;toc
| > Elapsed time is 15.7125 seconds.
| >
| > AMD Turion 64 X2, Octave 3.0.1 MSVC2008 SSE3
| > octave-3.0.1.exe:3> a=zeros(300,300);tic;b=(1.0+a)*a;toc
| > Elapsed time is 0.0244939 seconds.
| > octave-3.0.1.exe:4> a=zeros(300,300);tic;b=(NaN+a)*a;toc
| > Elapsed time is 0.0251131 seconds.
[...]
| Core2 T7200, 2.00GHz, Octave 3.0.1, Goto BLAS 1.22, Gentoo x86_64, Dell
| M90 Laptop
| octave:1> a=zeros(300,300);tic;b=(1.0+a)*a;toc
| Elapsed time is 0.082047 seconds.
| octave:2> a=zeros(300,300);tic;b=(NaN+a)*a;toc
| Elapsed time is 0.0106151 seconds.
That might seem surprising, but it's perfectly plausible. I can do almost
any NaN computation in my head, and recite the answer backwards.
> It seems that SSE is generally more predictable than x87, probably because
> it doesn't have any backwards compatibility concerns going 30 years back to
> care of. And increased precision of x87 (which can cause hard to understand
> problems because of the use of different formats for storage of doubles and
> FPU registers) aside, this seems to be a good enough reason to prefer SSE
> even in spite of the possible performance gains.
Prof. Kahan says:
http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf
| Extra-precise arithmetic attenuates the risk of chagrin due to roundoff.
and he can demonstrate real problems that extended precision solves.
OTOH, its opponents can demonstrate real problems that extended precision
introduces, such as double rounding and nondeterministic register spillage:
http://hal.archives-ouvertes.fr/docs/00/28/14/29/PDF/floating-point-article.pdf
And x87 in particular has only XFmode instructions, not [DS]Fmode:
http://gcc.gnu.org/ml/gcc/2003-08/msg01195.html
For the 8087, with a budget of 45000 transistors, maybe that was good
enough; but, as you point out, that was 1980. OTOH, SSE has no format
wider than 64 bits. Eighty-bit long double isn't a disco-era mistake;
it's truly useful, but PCs have become multimedia appliances and
scientific computing is less important to hardware manufacturers now.
I think both schools of thought are reasonable. I tend to be in Kahan's
camp, and see no compelling reason to switch sides.