help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Perfromance problem running multiple copies of Octave on a multicore


From: Olaf Till
Subject: Re: Perfromance problem running multiple copies of Octave on a multicore processor
Date: Thu, 10 Dec 2015 14:16:19 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

On Thu, Dec 10, 2015 at 11:47:02AM +0000, Ian McCallion wrote:
> On 1 December 2015 at 08:09, Olaf Till <address@hidden> wrote:
> ><snip>
> > An explanation could be using up the common L2 cache of the processor
> > cores, so that they must share the main memory bus. Maybe some
> > calculation in the newer Octave is spread over a larger memory area.
> >
> > It could be useful to find the smallest possible unit (e.g a native
> > Octave function, or a builtin function called by it) which shows your
> > problem.
> >
> > Your idea with different BLAS libraries seemed good to me, make sure
> > you tested this influence correctly.
> 
> On 5 December 2015 at 11:30, Ian McCallion <address@hidden> wrote:
> > <snip>
> >The bad performance definitely follows Octave 4.0.0.
> >
> > I am planning to instrument the code to find where the excess time is
> > being consumed
> 
> I instrumented the code. This showed matrix multiplication and
> log(matrix) are affected, whereas  dot multiplication and division do
> not seem to be affected.
> 
> I have written demonstrator functions that show the problem for matrix
> multiplication:
> 
>     perf1(8,1000,12,500,1000);
>     Running 8 processes.
>     4.0.0,  libopenblas382, rand(1000,12)*rand(12,500) 1000 times
>                took 11.07 seconds
>     3.8.2,  libopenblas382, rand(1000,12)*rand(12,500) 1000 times
>                took 7.22 seconds
> 
> The same magnitude of task on one processor:
> 
>    perf1(1,1000,12,500,8000)
>    Running 1 process
>    4.0.0,  libopenblas382, rand(1000,12)*rand(12,500) 8000 times
>               took 13.04 seconds
>    3.8.2,  libopenblas382, rand(1000,12)*rand(12,500) 8000 times
>                took 21.14 seconds
> 
> I will happily share the demonstrator with anyone who wants it. It may
> be useful to know whether the problem shows up on other processors and
> other motherboards. Mine is an intel i7-3610QM ASUS laptop with 8GB
> RAM running win7.

The code for matrix multiplication (in liboctave/array/dMatrix.cc,
function xgemm and how it is called) has only code-formatting changes
between Octave 3.8.2 and 4.0.0. So the same lapack functions should be
called. This indicates the cause are differences in the used (lapack
or) blas. (I know you stated the same blas was used in both, but you
didn't state how you checked it.)

As for your test, your code surely called rand() _before_ the loop,
not within? Better you post the code, including any shell scripts or
similar to call it.

What do

ldd /path/to/octave-3.8.0 | grep lapack
ldd /path/to/octave-3.8.0 | grep blas

and

ldd /path/to/octave-4.0.0 | grep lapack
ldd /path/to/octave-4.0.0 | grep blas

output on your system? (Replace octave-... with octave-...-cli or
whatever the (symlink to the) final executable is named.)

Olaf

-- 
public key id EAFE0591, e.g. on x-hkp://pool.sks-keyservers.net

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]