help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Octave 64-bit indexing built with ATLAS


From: Felix Willenborg
Subject: Re: Octave 64-bit indexing built with ATLAS
Date: Tue, 12 Sep 2017 11:38:30 +0200
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0

Dear Dmitri,

thanks for your reply. Good to know. As I said, I wouldn't call me an expert on that topic. When executing matrix multiplication instead of elementwise multiplication, I get the following outputs:
MFLOPS: (5.122373 +- 0.029796) (Reference LAPACK)
MFLOPS: (24.231716 +- 0.074426) (ATLAS)
MFLOPS: (29.056399 +- 0.131886) (OpenBLAS)
Now all cores are being used aswell (except reference LAPACK).

I have to do one correction in my compilation procedure though. I compiled ATLAS not with 'make -j'. This crashes somehow all the time. It has to be compiled with a simple 'make'. Maybe someone wants to validate the whole procedure and try it out so it can be added to https://www.gnu.org/software/octave/doc/interpreter/Compiling-Octave-with-64_002dbit-Indexing.html?

Best wishes,
Felix

Am 12.09.2017 um 00:51 schrieb Dmitri A. Sergatskov:


On Mon, Sep 11, 2017 at 1:39 PM, Felix Willenborg <address@hidden> wrote:


octave_mflops.m:
N = 400;
for i = 1:N
    n = 4096;
    x = rand(n, n);
    tic, x = x .* x;
    y = toc;
    mflops(i) = n*n / y / 1e6;
end

mflops_mean = sum(mflops)/N;
mflops_sig = std(mflops);
printf('MFLOPS: (%.2f +- %.2f)\n', mflops_mean, mflops_sig);
Now the funny part is the following, which confuses me a little bit. For the OpenBLAS build and the ATLAS build, I recieve the following values:
MFLOPS: (370.47 +- 7.60) (OpenBLAS)
MFLOPS: (370.43 +- 7.27) (ATLAS)
I expected ATLAS to be faster than OpenBLAS. Also: when monitoring the load with 'htop', only one CPU is fully loaded. I expected ATLAS to have parallel threading, which I tried to ensure by using libtatlas_Oct64.so. Am I expecting something wrong? And why, can someone explain to me what I did wrong?


​Openblas is (generally) faster.
x .* x does not use blas/lapack (hence, atlas) code, so that is why your result did not change.
Whatever number you calculate is a benchmark, but probably not actual mflops.
Replacing x.*x with x * x​ (and setting N = 4) I get on an old 4-core computer

With openblas:
octave:1> octave_mflops
MFLOPS: (5.19 +- 0.06)
​with atlas:
​octave:1> octave_mflops
MFLOPS: (3.56 +- 0.01)

​Regards,

Dmitri.
​--


-- 
Felix Willenborg

Arbeitsgruppe Machine Learning und Exzellenzcluster Hearing4all
Department für Medizinische Physik und Akustik
Fakultät für Medizin und Gesundheitswissenschaften 
Carl von Ossietzky Universität Oldenburg

Küpkersweg 74, 26129 Oldenburg
Tel: +49 441 798 3945

https://www.uni-oldenburg.de/machine-learning/

reply via email to

[Prev in Thread] Current Thread [Next in Thread]