octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parallel access to data created with Octave


From: Jaroslav Hajek
Subject: Re: Parallel access to data created with Octave
Date: Mon, 3 May 2010 17:14:39 +0200

On Sat, May 1, 2010 at 5:25 AM, Jarno Rajahalme <address@hidden> wrote:
> I have for quite some time now wanted to parallelize some OCT-file C++ code 
> using OpenMP. My code uses some global Octave data (Cells of Cells of vectors 
> or scalars). I had naively thought that even though Octave is not thread 
> safe, "just reading" the global variables would be thread-safe :-)
>
> It took some debugging, but I found out that, in order to avoid any dynamic 
> memory allocations, while traversing the data, I must:
>
> - use data(), get_rep(), and mex_get_data() to get pointers to the data 
> (otherwise temporaries are dynamically allocated and freed, which messes up 
> bad)
> - In cells, I had to index manually using .data()[idx].get_rep() to avoid 
> temp references
> - NOT use dims(), columns(), rows(), length(), or numel() on scalars. Safe on 
> matrices, but NOT on scalars. Have to inspect the data with is_scalar_type(), 
> and if yes, length is 1, otherwise use numel() to read the vector length
> - results have to be gathered via a pointer, parallel assignments to e.g. 
> RowVector, even if the space is preallocated, seem to fail.
>
> The data I used was all uint16 matrix and uint16 scalars.
>
> The numel on a scalar type was a had one to track down. There is a static dv 
> initialization within the scalar dims() function, which was a real pain. At 
> first I added dummy length(), numel() and capacity() functions (all returning 
> 1), but as I wanted my code to be portable, I deleted these changes in the 
> ov-base-scalar.h, and used the is_scalar_type() instead.
>
> For example, this crashes when parallel: (pCellStore is pointer to a global 
> Cell, arg1 is an octave index)
>
>  const Cell & su = pCellStore->xelem(arg1-1).cell_value();
>  const octave_idx_type smax = su.columns();
>
> This is a bit faster, but still crashes, when parallel:
>
>  const Cell su(pCellStore->xelem(arg1-1).cell_value());
>  const octave_idx_type smax = su.columns();
>
> But this does not crash:
>
>  const octave_cell * sup = static_cast<const octave_cell 
> *>(&(pCellStore->data()[arg1-1].get_rep()));
>  const octave_value * suovp = static_cast<const octave_value 
> *>(sup->mex_get_data());
>  const octave_idx_type smax = sup->columns();
>  const octave_idx_type srows = sup->rows();
>
>
> And, this is simple, but slow and crashes when parallel: (si is a loop 
> variable)
>
>  const uint16NDArray & sn = su.xelem(0,si).uint16_array_value();
>
> This is faster, but still crashes, when parallel:
>
>  const octave_value snv(su.xelem(0,si));
>  const uint16NDArray sna(snv.uint16_array_value());
>  const octave_idx_type snl = sna.nelem();
>  const octave_uint16 * snp = sna.fortran_vec();
>
> But this does not crash:
>
>  const octave_base_value * snvp(suovp[srows*si].internal_rep()); // either 
> octave_uint16_matrix or octave_uint16_scalar
>  const octave_idx_type snl(snvp->is_scalar_type() ? 1 : snvp->numel());
>  const octave_uint16 * snp(static_cast<const octave_uint16 
> *>(snvp->mex_get_data())); // works for both
>
> It gets a bit messy, but it is a lot faster, and works also with Octave 
> 3.2.3, when compiling the oct-file with OpenMP :-)
>
> Overall, with these changes the code became about 1.8x as fast (1 core), and 
> 3.4x as fast (2 cores), than my original code, where I followed the examples 
> in the Octave manual as I then could.  So there is a lot of overhead in "just 
> accessing" data in OCT files, if one is not careful!
>
> My original C++ itself was more than 145x faster than the equivalent, 
> optimized .m version. Now the speed difference is whopping 500x. This must be 
> about the worst case of comparison for Octave, as there are no matrix 
> operations, and there are 6 levels of for loops involved...
>

The problem is that your code may stop working essentially any time
because of internal changes, and it may be hard to figure out what is
wrong. For instance, in Octave 3.3+, a double real value with
is_scalar_type() = true is *not* necessarily an octave_scalar
instance.

I would suggest you try to hoist as much of the value extraction calls
as possible out of the parallelized loops. If it's not possible, you
can use #pragma omp critical. Perhaps the pragmas will be inserted in
Octave's sources in the future.

-- 
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz



reply via email to

[Prev in Thread] Current Thread [Next in Thread]