help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: cputime and parallel computing


From: Olaf Till
Subject: Re: cputime and parallel computing
Date: Sat, 8 Sep 2018 13:58:50 +0200
User-agent: NeoMutt/20170113 (1.7.2)

On Thu, Sep 06, 2018 at 02:35:03PM -0500, Anton Dereventsov wrote:
> I've noticed that cputime() does not measure time correctly with parallel
> package.
> For instance, in the following script I multiply two random 1000-by-1000
> matrices 10 times and time the execution using tic-toc and cputime. I use a
> standard 'for' cycle, an 'arrayfun' function, and a 'pararrayfun' function:
> 
>       N = 1000;
>       K = 10;
>       
>       t0 = tic; t1 = cputime;
>               for i = 1 : K
>                       rand(N)*rand(N);
>               endfor
>       printf('wall time: %.6f | cpu time: %.6f\n', toc(t0), cputime-t1);
>       
>       t0 = tic; t1 = cputime;
>               arrayfun(@(n) rand(N)*rand(N), 1:K, 'UniformOutput', 0);
>       printf('wall time: %.6f | cpu time: %.6f\n', toc(t0), cputime-t1);
>       
>       pkg load parallel
>       t0 = tic; t1 = cputime;
>               pararrayfun(nproc, @(n) rand(N)*rand(N), 1:K, 'VerboseLevel', 
> 0);
>       printf('wall time: %.6f | cpu time: %.6f\n', toc(t0), cputime-t1);
> 
> Here is the output:
>       wall time: 5.190973 | cpu time: 5.228000
>       wall time: 5.373872 | cpu time: 5.404000
>       wall time: 3.517548 | cpu time: 0.116000
> 
> This code was executed in Octave 4.4.1 on Ubuntu 16.04 and the results are
> consistent with each run, however the last cpu time measurement does not
> seem to be correct. I would expect all three cpu times to be very similar
> since the computed problem is the same. Moreover, the last measurement is
> far too small (it becomes even more obvious if N is increased).

No, that's what is to be expected... in pararrayfun, the code is
executed in (nproc) extra processes. Your 'cputime-t1' after calling
pararrayfun measures the cpu usage of the outer process, which doesn't
do the multiplications, but only waits for the results.

> Is there a way to correctly measure cpu time on parallelized chunks
> of code?

Currently there isn't, this would have to be done inside (the code of)
pararrayfun. But the sum of the usage times of the different processor
cores should be similar to the single core cpu usage without
parallelization, even if the overall processing time is shorter. Is
there really a need for measuring cpu usage within
parcellfun/pararrayfun?

> Also, how come that the cpu time is slightly bigger that the wall time in
> the first two measurements even though they are executed on a single core?

I'm not sure, but the difference is so small, that I would attribute
it to calling cputime() _after_ toc(). Have you tried it in reversed
order?

Olaf

-- 
public key id EAFE0591, e.g. on x-hkp://pool.sks-keyservers.net

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]