help-gsl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-gsl] Questions about the code of some functions in cblas impl


From: José Luis García Pallero
Subject: Re: [Help-gsl] Questions about the code of some functions in cblas implementation
Date: Thu, 18 Jun 2009 23:42:57 +0200

El 18 de junio de 2009 21:49, Brian Gough <address@hidden> escribió:

> You're right, I will remove that unnecessary line. Thanks.
> > 2- I can see that, as in the original BLAS, *axpy functions has loop
> > unrolling by hand. Is this nessesary today with the optimizations of the
> > copiler? I can see in other functions of the gsl-cblas that unrolling is
> not
> > used (though in the original fortran BLAS is still used).
>
> As you say, it is probably not needed with a recent compiler. I don't
> know whether it does any harm to leave it though (has anyone tried a
> comparison?).  I am tempted to #ifdef it out and replace it with a
> plain loop.


I've tried a test with the code below. You must call the function with one
argument: the size of the loop. For try rolled and unrolled loop, you must
change the value of the define sentence. Compiled without optimizations (gcc
test.c -o test) and called as time ./test 2000000000, the execution time in
my machine (iBook G4, gcc 4.3.0) is:

No loop unrolling: 27.5 s
Loop unrolling: 11.1 s

Compiled with the minimum optimization (gcc -O test.c -o test) the times
are:

No loop unrolling: 0.005 s
Loop unrolling: 0.6 s

And compiled with the most agressive flag (gcc -O3 test.c -o test, but
without -funroll-loops option) the times are:

No loop unrolling: 0.005 s
Loop unrolling: 0.005 s

I think that with modern compilers unroll loops by hand has no sense (and in
some cases is harmful), because with the optimization options this work is
actually done at compiling time. With high level of optimization roll and
unroll codes are equivalent, but I think that simple code (rolled) are
easier for a newbie reader that unrolled, and in the case of blas, for
example, we can avoid some if sentences for distinguish between incX and
incY equal or not equal to 1. The refblas fortran code for level 1, for
example, was written in the 70's and I suppose that in these times unroll
was very important.

#include<stdlib.h>
#define LU 1
int main(int argc,char* argv[])
{
    int n=0,i=0,a=0;
    n = atoi(argv[1]);
    if(LU)
    {
        for(i=0;i<n;i++)
        {
            a = i*i+i;
        }
    }
    else
    {
        for(i=0;i<n;i+=5)
        {
            a = i*i+i;
            a = i*(i+1)+i;
            a = i*(i+2)+i;
            a = i*(i+3)+i;
            a = i*(i+4)+i;
        }
    }
    return 0;
}

-- 
*****************************************
José Luis García Pallero
address@hidden
(o<
/ / \
V_/_
Use Debian GNU/Linux and enjoy!
*****************************************


reply via email to

[Prev in Thread] Current Thread [Next in Thread]