help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A faster sum


From: Schloegl Alois
Subject: Re: A faster sum
Date: Sat, 04 Jun 2005 22:51:46 +0200
User-agent: Internet Messaging Program (IMP) 4.1-cvs



There is a simple explanation for why sum is currently slow.  It's
related to the fact that it needs to work across any dimentsion for
N-d arrays.  Probably the current code for that is not so good.  We
can probably optimize a few simple cases.  Or maybe someone will have
a smarter way to do what we need, perhaps by doing something more
intelligent with strides in a single loop or something.  I don't have
time to think about how to fix the problem just now, but the code to
look at is in the macro MX_ND_REDUCTION in liboctave/mx-inlines.cc.

jwe



I was looking into this in more detail. When running the following script using SUMSKIPNAN.OCT compiled from http://cvs.sourceforge.net/viewcvs.py/*checkout*/octave/octave-forge/extra/NaN/sumskipnan.cc


f = { 'sum(X)', 'sum(X,1)', 'sum(X,2)', 'sumskipnan(X)', 'sumskipnan(X,1)', 'sumskipnan(X,2)','ones(1,n)*X','X*ones(n,1)'};
K =repmat(NaN,length(f),16);
for N=[12],%0:13];
       n = ceil(2^N*1.4);
       X = rand(n);
       for k = 1:length(f),
               t = cputime;
               s = eval(f{k});
               K(k,N+1) = cputime-t;
               fprintf(1,'%2i %2i\t%15s\t%5.2f\n',k,n,f{k},K(k,N+1)/K(1,N+1));
       end;
end;

one gets this result. 1 5735          sum(X)  1.00
2 5735        sum(X,1)  1.00
3 5735        sum(X,2)  6.33
4 5735   sumskipnan(X)  1.11
5 5735 sumskipnan(X,1)  1.11
6 5735 sumskipnan(X,2)  5.17
7 5735     ones(1,n)*X  0.25
8 5735     X*ones(n,1)  0.70

SUMKSKIPNAN.OCT does not use the macro MX_ND_REDUCTION. Despite, it shows that slowdown for DIM=2; A similar effect is shown in Matlab. Here are the results: 1 5735 sum(X) 1.00
2 5735        sum(X,1)  0.98
3 5735        sum(X,2) 12.68
4 5735   sumskipnan(X)  1.52
5 5735 sumskipnan(X,1)  1.52
6 5735 sumskipnan(X,2) 14.62
7 5735     ones(1,n)*X  0.56
8 5735     X*ones(n,1)  0.64

We can also see that the matrix multiplication is much faster for DIM=2. Hence, the slowdown can not only be explained only by an inefficent implementation of SUM or SUMSKIPNAN. This suggests that the slow down is caused mainly by the cache lines. Using ATLAS/BLAS implementing SUM could provide a significant performance gain. However, I'm still wondering how SUMSKIPNAN could be implemented efficiently using ATLAS or BLAS. Any ideas ?


Alois



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------



reply via email to

[Prev in Thread] Current Thread [Next in Thread]