help-octave
[Top][All Lists]

## Re: cputime and parallel computing

 From: Anton Dereventsov Subject: Re: cputime and parallel computing Date: Mon, 10 Sep 2018 20:51:14 -0400

Thanks, that makes sense.

Is there really a need for measuring cpu usage within
parcellfun/pararrayfun?

I'm trying to compare the computational complexity of two algorithms in a simple
and intuitive way and I think that cpu time is fit for this purpose as it represents
the computational complexity of an algorithm and is a bit more independent of
an implementation (as opposed to wall-clock time).

I'm not sure, but the difference is so small, that I would attribute
it to calling cputime() _after_ toc(). Have you tried it in reversed
order?

I tried that but cpu time is still slightly bigger.

Anton

On Sat, Sep 8, 2018 at 7:58 AM, Olaf Till wrote:
On Thu, Sep 06, 2018 at 02:35:03PM -0500, Anton Dereventsov wrote:
> I've noticed that cputime() does not measure time correctly with parallel
> package.
> For instance, in the following script I multiply two random 1000-by-1000
> matrices 10 times and time the execution using tic-toc and cputime. I use a
> standard 'for' cycle, an 'arrayfun' function, and a 'pararrayfun' function:
>
>       N = 1000;
>       K = 10;
>
>       t0 = tic; t1 = cputime;
>               for i = 1 : K
>                       rand(N)*rand(N);
>               endfor
>       printf('wall time: %.6f | cpu time: %.6f\n', toc(t0), cputime-t1);
>
>       t0 = tic; t1 = cputime;
>               arrayfun(@(n) rand(N)*rand(N), 1:K, 'UniformOutput', 0);
>       printf('wall time: %.6f | cpu time: %.6f\n', toc(t0), cputime-t1);
>
>       t0 = tic; t1 = cputime;
>               pararrayfun(nproc, @(n) rand(N)*rand(N), 1:K, 'VerboseLevel', 0);
>       printf('wall time: %.6f | cpu time: %.6f\n', toc(t0), cputime-t1);
>
> Here is the output:
>       wall time: 5.190973 | cpu time: 5.228000
>       wall time: 5.373872 | cpu time: 5.404000
>       wall time: 3.517548 | cpu time: 0.116000
>
> This code was executed in Octave 4.4.1 on Ubuntu 16.04 and the results are
> consistent with each run, however the last cpu time measurement does not
> seem to be correct. I would expect all three cpu times to be very similar
> since the computed problem is the same. Moreover, the last measurement is
> far too small (it becomes even more obvious if N is increased).

No, that's what is to be expected... in pararrayfun, the code is
executed in (nproc) extra processes. Your 'cputime-t1' after calling
pararrayfun measures the cpu usage of the outer process, which doesn't
do the multiplications, but only waits for the results.

> Is there a way to correctly measure cpu time on parallelized chunks
> of code?

Currently there isn't, this would have to be done inside (the code of)
pararrayfun. But the sum of the usage times of the different processor
cores should be similar to the single core cpu usage without
parallelization, even if the overall processing time is shorter. Is
there really a need for measuring cpu usage within
parcellfun/pararrayfun?

> Also, how come that the cpu time is slightly bigger that the wall time in
> the first two measurements even though they are executed on a single core?

I'm not sure, but the difference is so small, that I would attribute
it to calling cputime() _after_ toc(). Have you tried it in reversed
order?

Olaf

--
public key id EAFE0591, e.g. on x-hkp://pool.sks-keyservers.net