help-gsl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers nee


From: Oliver Jennrich
Subject: Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL...
Date: Mon, 6 Aug 2007 16:29:51 +0200

On 7/27/07, Gordan Bobic <address@hidden> wrote:
> On Fri, 27 Jul 2007, Jochen Küpper wrote:
>
> >> [...example..]
> >
> >> Using floats instead of doubles can lead to quite significant performance
> >> differences.
> >
> > On you Pentium 3, not the average number cruncher these days.
> > A Opteron or any of the modern Intel CPUs would be more appropriate.
>
> *sigh*
>
> On an x86-64 Core2/1.9GHz, CentOS/x86-64 v5, ICC v9.1.051/x86-64
> Using the small sample program I posted earlier.
> Compiled with: icc -msse3 -xP -fp-model fast=2
>
> Using floats: 2.65 seconds
> Using doubles: 5.29 seconds
>
> Twice as many floats vectorize per operation as doubles. Thus it goes
> twice as fast. How much more evidence do you require?

No you guys got me interested.

Here is what I tried:

#include <stdio.h>
#include <math.h>
int main ()
{
  const float foo = 29.123;

  unsigned int    j,k;
  unsigned int    i;
  double a[] = {1,2,3,4,5,6,7,8};
  double b[] = {5,6,7,8,9,10,11,12};
  double c[] = {0,0,0,0,0,0,0,0};

  for (k=0;k<100000;k++){
    for (j=0;j<10000;j++){
      for (i = 0; i < 8; i++)
        {
          c[ i ] = (j*k*(a[ i ]+b[ i ]));
        }
    }
  }
  printf("%f", c[3]);
  return 0;
}

with gcc 4.1.1
gcc -O3 -march=pentium-m -malign-double -mfpmath=sse -msse2  -Wall -o
vect vect.c -ftree-vectorize -ftree-vectorizer-verbose=5

on a
x86 Family 6 Model 13 Stepping 8 GenuineIntel ~1862 Mhz

The multiplication with j and k ist just so that -O3 doesn't optimize
the outer loops to oblivion, and to raise the overall times above the
clock noise

The results are puzzling:

double, no vectorization: 23.797s
double vectorization: 23.858s
float, no vec: 15.561s
float, vec: 5.843s

long double, no vec (as sse2 is not enough...): 33.344s

Ok, I do understand why long double is slower than double (I think).
But why does vectorization not make the slightest bit of difference
when using doubles?


> Where does this whole nonsense of "doubles are as fast as floats" come
> from? I know it's taught everywhere these days, but it is absolutely, not
> true. Whoever first came up with it has a lot to answer for.

Indeed.



-- 
Space -- the final frontier




reply via email to

[Prev in Thread] Current Thread [Next in Thread]