help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NaN slowdown with some processors


From: Jaroslav Hajek
Subject: Re: NaN slowdown with some processors
Date: Wed, 4 Jun 2008 14:01:32 +0200

On Wed, Jun 4, 2008 at 12:55 PM, Olli Saarela <address@hidden> wrote:
>> I'm planning to buy a new desktop machine, and since my computations
>> utilize NaN values heavily, I'd like to know whether Intel Core 2
>> processors suffer from the same slowdown with NaN values as Pentium. For
>> details, see http://www.cygnus-software.com/papers/x86andinfinity.html
>
> Thank you all, the data you have provided has clarified the issue. In
> addition to the replies posted to the list, I got some mail showing 100x
> slowdown with Core 2 / Debian / Octave 3.0.1. It looks like there still
> is a NaN related slowdown in Core 2 when the computation isn't carried
> out using SSE2/3.
>
> If I have understood correctly, gcc can be forced to generate SSEn
> instructions, which avoids this performance degradation completely.
> There also seems to be a number of Linux installations of Octave out
> there that would benefit from such compile options.
>
> The situation is slightly different with MSVC. The documentation on MSDN
> says
>
>   The optimizer will choose when and how to make use of the SSE and SSE2
>   instructions when /arch is specified. SSE and SSE2 instructions will
>   be used for some scalar floating-point computations, when it is
>   determined that it is faster to use the SSE/SSE2 instructions and
>   registers rather than the x87 floating-point register stack. As a
>   result, your code will actually use a mixture of both x87 and SSE/SSE2
>   for floating-point computations.
>
> This might explain the NaN-related slowdown on Windows machines with
> Intel processors. Drawing (extrapolating) conclusions from the posted
> figures, MSVC2008&SSE3 seem to do a much better job in this respect than
> MSVC2005&SSE2, even though some performance degradation still remains.
>

Just to make things clear:
matrix multiplication is carried out by BLAS - either system-supplied
or Octave will compile
a reference implementation for you (which is usually slower). Whether
or not SSEx abilities of your processor are utilized depends on where
do you get your BLAS from. If you do not provide optimized BLAS
(ATLAS, ACML, Intel's MKL) you need to pay attention whether you
really force the reference BLAS to be compiled with SSE support. For
instance if you have gfortran, then SSE flags must go in FFLAGS!
Generally, it takes some playing around to find the Fortran flags that
get the max out of the reference BLAS, and it is not necessarily the
best combination for the rest of libcruft. That's why, if you care for
performance, it's better to compile BLAS (and possibly LAPACK)
standalone, ideally an optimized version thereof (ATLAS, GotoBLAS) or
get a compiled optimized version (ACML, Intel MKL).

cheers


> Thank you all once again!
>   Olli
>
> _______________________________________________
> Help-octave mailing list
> address@hidden
> https://www.cae.wisc.edu/mailman/listinfo/help-octave
>



-- 
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


reply via email to

[Prev in Thread] Current Thread [Next in Thread]