help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: octave 64-bit and multi-threading


From: Grothausmann, Roman Dr.
Subject: Re: octave 64-bit and multi-threading
Date: Tue, 21 Jul 2015 14:24:41 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.7.0

Thanks ​Dmitri and Tatsuro,


I did some more tests. On our server with 24 cores (12*2 with hyper-threading) Both octave binaries (my self-compiled and that from Debian) link against libblas, but mine also against libcblas and libf77blas (even though I've chosen a minimal configuration, i.e. no qhull, sparse etc):


ldd /usr/bin/octave-cli | grep blas
        libblas.so.3 => /usr/lib/libblas.so.3 (0x00007fae50ab6000)

ldd /opt/octave-4.0.0/bin/octave-cli-4.0.0  | grep blas
        libcblas.so.3 => /usr/lib/libcblas.so.3 (0x00007f86e53f8000)
        libf77blas.so.3 => /usr/lib/libf77blas.so.3 (0x00007f86e51d9000)
        libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f86e05dc000)

With the libblas3 from Debian, running:

/usr/bin/time -v /usr/bin/octave-cli << EOF
tic
bigMatrixA = rand(3000000,80);
bigMatrixB = rand(80,700);
bigMatrixC = bigMatrixA * bigMatrixB;
toc
disp("done");
EOF

        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 4:47.89
        Maximum resident set size (kbytes): 18322764

takes 286.552 seconds
and with libblas3 from libatlas3-base.deb from Debian 28.3687 seconds, CPU 176%.

With my self-compiled libatlas3-base it runs in only 8.38979 seconds and gets: 826% CPU in total (and peak about 2400% CPU, as it should).


For my self-compiled octave with libblas3 from libatlas3-base.deb (also has libcblas.so.3 and libf77blas.so.3):


/usr/bin/time -v /opt/octave-4.0.0/bin/octave-cli << EOF
tic
bigMatrixA = rand(3000000,80);
bigMatrixB = rand(80,700);
bigMatrixC = bigMatrixA * bigMatrixB;
toc
disp("done");
EOF

        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:53.45
        Maximum resident set size (kbytes): 18323000

takes 51.2007 seconds.

With my self-compiled libatlas3-base it runs still for 39.052 seconds and still only gets: 99% CPU.

So I conclude my self-compiled libatlas3-base seems to be multi-threaded but my self-compiled octave does not make use of it. Do I need some extra configure options? Or does octave need additional packages, which are optional, to run multi-threaded?


My self-compiled octave is not using: Qhull HDF5 GLPK cURL gl2ps OSMesa qrupdate ARPACK:

./configure --prefix /opt/octave-4.0.0/ --disable-docs --enable-64


Even when also linking against SuiteSparse (also compiled with 64-bit giving AMD CAMD COLAMD CCOLAMD CHOLMOD CXSparse UMFPACK) the performance of the matrix multiplication does not increase significantly:

ss=/opt/SuiteSparse-4.4.4/ LD_LIBRARY_PATH="$ss/lib" CPPFLAGS="-I$ss/include" LDFLAGS="-L$ss/lib" \
./configure --prefix /opt/octave-4.0.0/ --disable-docs --enable-64 \
--with-amd="-lamd -lsuitesparseconfig -lrt" \
--with-camd="-lcamd -lamd -lsuitesparseconfig -lrt" \
--with-colamd="-lcolamd -lcamd -lamd -lsuitesparseconfig -lrt" \
--with-ccolamd="-lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lrt" \
--with-cholmod="-lcholmod -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lrt" \ --with-umfpack="-lumfpack -lcholmod -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lrt -llapack"


Thanks for any help or hints
Roman



On 21/07/15 02:36, Dmitri A. Sergatskov wrote:


On Mon, Jul 20, 2015 at 2:37 PM, Grothausmann, Roman Dr.
<address@hidden <mailto:address@hidden>>
wrote:

    Dear mailing list members,


    Using instructions from
    
https://www.gnu.org/software/octave/doc/interpreter/Compiling-Octave-with-64_002dbit-Indexing.html
    I managed to compile octave-4.0.0 with --enable-64 using a 64-bit atlas
    library under debian.
    The test
    a = zeros (1024*1024*1024*3, 1, 'int8');
    works.
    I was also able to load 10GB units with fread and do some computations with
    it successfully.

    According to
    
http://stackoverflow.com/questions/11889118/get-gnu-octave-to-work-with-a-multicore-processor-multithreading
    octave should be multi-threaded when using a self-compiled atlas lib.
    However the test

    tic
    bigMatrixA = rand(3000000,80);
    bigMatrixB = rand(80,30);
    bigMatrixC = bigMatrixA * bigMatrixB;
    toc
    disp("done");

    takes 4.8s with the debian default octave (3.8.2 32-bit)
    and 7.1s with my self-compiled octave (4.0.0 64-bit)

    Is it slower because it is now 64-bit?
    Are there better test for multi-threading in octave?



​Run top in a separate terminal and see if octave process takes more than 100%.
(while running your test).
Compiling optimized atlas is tricky. For one thing make sure you have disabled
CPU throttling.


    Thanks for any help or hints
    Roman


​Dmitri.
--
​




--
Dr. Roman Grothausmann

Tomographie und Digitale Bildverarbeitung
Tomography and Digital Image Analysis

Institut für Funktionelle und Angewandte Anatomie, OE 4120
Medizinische Hochschule Hannover
Carl-Neuberg-Str. 1
D-30625 Hannover

Tel. +49 511 532-2900



reply via email to

[Prev in Thread] Current Thread [Next in Thread]