help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Perfromance problem running multiple copies of Octave on a multicore


From: Mike Miller
Subject: Re: Perfromance problem running multiple copies of Octave on a multicore processor
Date: Thu, 10 Dec 2015 13:11:43 -0500
User-agent: Mutt/1.5.24 (2015-08-30)

On Thu, Dec 10, 2015 at 13:46:53 +0000, Ian McCallion wrote:
> Hi Olaf,
> 
> Here is the code. ldd is unix-only so haven't done the commands you
> suggested. However I will send you the win equivalent shortly.
> 
>  The cmd script is how I switch blas versions.
> 
> One important thing I note (which I thought was a testing error and
> ignored earlier), is that the only combination that runs fast is
> Octave 3.8.0 and the blas shipped with it.
> 
> I will experiment with different versions of lapack here. Please take
> a quick look at the code to ensure there are no siily errors. You will
> need source chnges for your environment to run it.

For comparison, here is what I get calling your functions with Octave
3.8.2 and 4.0.0 in my Debian environment (using OpenBLAS). I also upped
the count a little to something meaningful for my system.

Without OMP_NUM_THREADS:

  >> perf1 (8, 1000, 12, 500, 5000)
  4.0.0,                , rand(1000,12)*rand(12,500) 5000 times took 14.52 
seconds
  3.8.2,                , rand(1000,12)*rand(12,500) 5000 times took 19.39 
seconds
  Running 8 processes
  >> perf1 (1, 1000, 12, 500, 8*5000)
  4.0.0,                , rand(1000,12)*rand(12,500) 40000 times took 13.92 
seconds
  3.8.2,                , rand(1000,12)*rand(12,500) 40000 times took 13.98 
seconds
  Running 1 processes

With OMP_NUM_THREADS set to 1 (disabling multi-processing within
OpenBLAS):

  >> perf1 (8, 1000, 12, 500, 5000)
  4.0.0,           1.0.0, rand(1000,12)*rand(12,500) 5000 times took 10.16 
seconds
  3.8.2,           1.0.0, rand(1000,12)*rand(12,500) 5000 times took 13.02 
seconds
  Running 8 processes
  >> perf1 (1, 1000, 12, 500, 8*5000)
  4.0.0,           1.0.0, rand(1000,12)*rand(12,500) 40000 times took 29.05 
seconds
  3.8.2,           1.0.0, rand(1000,12)*rand(12,500) 40000 times took 29.23 
seconds
  Running 1 processes


So the observations I make:

 • No significant change between 3.8.2 and 4.0.0, 4.0.0 maybe slightly
   faster when running multiple jobs
 • Make sure to take into account the interaction between running
   parallel Octave jobs and the use of OpenMP within OpenBLAS

Are you using the 4.0.0 official binary? Which 3.8.2 binary are you
using?

-- 
mike



reply via email to

[Prev in Thread] Current Thread [Next in Thread]