help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Perfromance problem running multiple copies of Octave on a multicore


From: Ian McCallion
Subject: Re: Perfromance problem running multiple copies of Octave on a multicore processor
Date: Thu, 10 Dec 2015 11:47:02 +0000

Hi Olaf,

As a reminder of the topic here, Octave 4.0.0 is around 50% slower
than Octave 3.8.2 when a large signal processing task is shared across
multiple copies of Octave running in parallel on a multicore CPU. I'm
looking either for a way round the problem or for recognition that
this is an Octave bug.

On 1 December 2015 at 08:09, Olaf Till <address@hidden> wrote:
><snip>
> An explanation could be using up the common L2 cache of the processor
> cores, so that they must share the main memory bus. Maybe some
> calculation in the newer Octave is spread over a larger memory area.
>
> It could be useful to find the smallest possible unit (e.g a native
> Octave function, or a builtin function called by it) which shows your
> problem.
>
> Your idea with different BLAS libraries seemed good to me, make sure
> you tested this influence correctly.

On 5 December 2015 at 11:30, Ian McCallion <address@hidden> wrote:
> <snip>
>The bad performance definitely follows Octave 4.0.0.
>
> I am planning to instrument the code to find where the excess time is
> being consumed

I instrumented the code. This showed matrix multiplication and
log(matrix) are affected, whereas  dot multiplication and division do
not seem to be affected.

I have written demonstrator functions that show the problem for matrix
multiplication:

    perf1(8,1000,12,500,1000);
    Running 8 processes.
    4.0.0,  libopenblas382, rand(1000,12)*rand(12,500) 1000 times
               took 11.07 seconds
    3.8.2,  libopenblas382, rand(1000,12)*rand(12,500) 1000 times
               took 7.22 seconds

The same magnitude of task on one processor:

   perf1(1,1000,12,500,8000)
   Running 1 process
   4.0.0,  libopenblas382, rand(1000,12)*rand(12,500) 8000 times
              took 13.04 seconds
   3.8.2,  libopenblas382, rand(1000,12)*rand(12,500) 8000 times
               took 21.14 seconds

I will happily share the demonstrator with anyone who wants it. It may
be useful to know whether the problem shows up on other processors and
other motherboards. Mine is an intel i7-3610QM ASUS laptop with 8GB
RAM running win7.

Cheers... Ian



reply via email to

[Prev in Thread] Current Thread [Next in Thread]