bug-gsl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-gsl] Improving gsl_blas_dsyrk routine


From: 王一
Subject: [Bug-gsl] Improving gsl_blas_dsyrk routine
Date: Sat, 12 Apr 2008 16:27:26 +0800

Dear GSL developers:
I benifits a lot recently by using your powerful routines. Thus I became 
interested in the detail of your routines. 
When I look through the BLAS source code, I found  gsl_blas_dsyrk routine can 
be significantly improved by a change of looping order. 
As I'm just a Ph.D. student of Statistical Genetics, I'm not sure my finding is 
generally true.
I just wish to help the lovely GSL! Please let me know the result if possible.

The version number of GSL:                              1.8
The hardware and operating system:      T7100 CPU, Windows XP
The compiler used, including version number and compilation options: 
Dev-C++/MinGW ,  full optimization
A description of the bug behavior: I use a real dataset with 6238 rows by 617 
columns of double precision. The gsl_blas_dsyrk routine has 1/5 speed than my 
routine in some cases.

Rows    Columns Transpose       Mine/sec        GSL_BLAS/sec
6238            617                     No                      26.8            
        25.2
6238            617                     Yes                     3.4             
                15.9
617                     6238            No                      2.4             
                 2.4
617                     6238            Yes                     54              
            256.5


A short program which exercises the bug:
My gsl_blas_dsyrk routine analog is listed below. It is a function of my C++ 
matrix template class. Hope this will not trouble you.

template   <class type>
void    matrix<type>::square(matrix&  M,  bool    T){
    register uint    r,  c, k;
    register    type    s;
 if(!T){
  resize(M.rn,    M.rn);
  for(r=0;    r<M.rn; r++)    for(c=r;    c<M.rn; c++){
   s=type(0);
   for(k=0;    k<M.cn; k++)    s+=M(r, k)*M(c, k);
   (*this)(c,  r)=(*this)(r,  c)=s;
  }
 }
 else{
  resize(M.cn,    M.cn);
// The follow code changes looping order, just like DGEMM does
  for(k=0;    k<M.rn; k++) for(r=0;    r<M.cn; r++){
   s=M(k, r);
   for(c=r;    c<M.cn; c++) (*this)(r,  c)+=s*M(k, c);
  }
  for(r=0;    r<M.cn; r++) for(c=r+1;    c<M.cn; c++) (*this)(c,  r)=(*this)(r, 
 c);
 }
}

And your corresponding codes (similar with the Fortan code) are:
  if (uplo == CblasUpper && trans == CblasNoTrans) {
    for (i = 0; i < N; i++) {
      for (j = i; j < N; j++) {
        BASE temp = 0.0;
        for (k = 0; k < K; k++) {
          temp += A[i * lda + k] * A[j * lda + k];
        }
        C[i * ldc + j] += alpha * temp;
      }
    }

  } else if (uplo == CblasUpper && trans == CblasTrans) {
// The following code may be improved
    for (i = 0; i < N; i++) {
      for (j = i; j < N; j++) {
        BASE temp = 0.0;
        for (k = 0; k < K; k++) {
          temp += A[k * lda + i] * A[k * lda + j];
        }
        C[i * ldc + j] += alpha * temp;
      }
    }





王一
2008-04-12

reply via email to

[Prev in Thread] Current Thread [Next in Thread]