octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: indexing improvements - advice wanted


From: Jaroslav Hajek
Subject: Re: indexing improvements - advice wanted
Date: Wed, 29 Oct 2008 20:05:44 +0100

On Wed, Oct 29, 2008 at 7:24 PM, John Swensen <address@hidden> wrote:
>
> On Oct 25, 2008, at 3:12 PM, Jaroslav Hajek wrote:
>
>> Hi all,
>>
>> I'm currently working on a patch improving the indexing & indexed
>> assignment as well as cleaning up the
>> related interface in Array<T>.
>> A short summary of intended changes:
>>
>> 1. idx_vector class is almost completely rewritten. Instead of storing
>> all possible information at once and using conditionals, polymorphism
>> is used to represent different types of indexing (colon, range,
>> scalar, vector). Several redundant operations were removed, resulting
>> in less overhead when creating idx_vector classes, and caching is done
>> in constructors.
>> A pleasant consequence is that const idx_vector& arguments can be used
>> rather than writable references. Another improvement is that
>> internally, zero-based index arrays (Array<octave_idx_type>) are
>> converted to idx_vector without actually copying the data.
>>
>> 2. index & assign methods now take a completely different approach.
>> Instead of the previous looping over all elements using
>> increment_index, a recursive procedure is used (this should be more
>> amenable to possible parallelization in the future). The bottom-level
>> loops are always specialized for each idx-vector type, translating the
>> operation into std::copy, std::reverse_copy, std::fill or a fast loop.
>> A similar approach is taken for resize, which optimally translates to
>> sequence of std::copy and std::fill operations.
>>
>> 3. before doing the actual work, reductions are applied to multiple
>> indices, in an attempt to reduce the dimensionality of the operation.
>> For instance, A(:,:,a:b) is automatically translated into a single
>> contiguous range, and is thus done by a single std::copy. Similarly,
>> A(i,j,:,k) is translated to a single range and is thus done using a
>> single loop (as if using sub2ind).
>>
>> The current state of the patch is attached. To illustrate the
>> performance improvements, I tried the following code:
>> n = 500;
>> a = ones (n,n,n);
>> tic; b = a(5:end-5); toc
>> tic; b = a(:,:,5:end-5); toc
>> a = ones (2, 3, n^3 / 6);
>> tic; for i = 1:5; b = a(2,2,:); endfor; toc
>> a = ones (n,n,n);
>> tic; a(5:end-5) = 1; toc
>> tic; a(:,:,5:end-5) = 1; toc
>> b = zeros (n,n,n-9);
>> tic; a(:,:,5:end-5) = b; toc
>> tic; a (n + 10, n + 10, n + 10) = 2; toc
>>
>> (set n to a suitable value according to available memory)
>>
>> using recent tip build, I got:
>> Elapsed time is 2.09076 seconds.
>> Elapsed time is 38.0217 seconds.
>> Elapsed time is 34.0547 seconds.
>> Elapsed time is 0.756007 seconds.
>> Elapsed time is 32.9049 seconds.
>> Elapsed time is 33.4742 seconds.
>> Elapsed time is 7.66801 seconds.
>>
>> whereas using the new patch, I got:
>> Elapsed time is 1.25063 seconds.
>> Elapsed time is 1.34406 seconds.
>> Elapsed time is 2.45575 seconds.
>> Elapsed time is 0.509049 seconds.
>> Elapsed time is 0.490327 seconds.
>> Elapsed time is 0.761768 seconds.
>> Elapsed time is 1.46046 seconds.
>>
>> So, as you can see, the performance speed-ups range from about 80% to
>> as much as 60x in some cases.
>>
>> If you try the patch, you can notice a number of test failures in the
>> test suite. Apart from a couple of persisting bugs (e.g. cell2mat for
>> some reason cannot concatenate some arrays), some failures are caused
>> by changes in error mesages
>> (in an attempt for invalid resize-on-assigment, it is now resize who
>> gripes, so the message is different).
>> A number of other failing testcases are triggered by behaviour that
>> currently works in Octave but fail in Matlab. For instance,
>> this currently works in Octave but not in Matlab:
>> x = ones (2,0);
>> x([],:,:) = ones (2,0);
>>
>> I believe Matlab is right here because the rule of thumb for
>> assignment is that, omitting singleton dimensions on the right and
>> scalar subscripts at the left,  the dimensions of LHS and rhs should
>> match. Here, obviously the first index has extent zero while the
>> leding rhs dim is 2, so that results in a mismatch.
>>
>> So, my questions (or opinion inquiries) are:
>>
>> 1. how much is backward compatibility of error messages desirable? In
>> the current patch, I let Array<T>::resize () gripe when invalid
>> resizing is requested (e.g. A(10) = 1 with A = ones(3)). However, the
>> error message changes in such case; otherwise, an argument to resize
>> would be needed to distinguish whether it's called from
>> Array<T>::assign or from elsewhere. This is certainly possible, but is
>> it worth cluttering interfaces by optional arguments just to keep
>> error messages backward compatible? (note that they're by no means
>> Matlab compatible, but that's not even reasonable to want)
>>
>> 2. what about the failing tests relying on Matlab incompatible (and
>> slightly illogical) behaviour? Should the tests be fixed, or should I
>> try to adapt the code so that it behaves backward in a
>> backward-compatible manner? Note that these are rather corner cases,
>> quite unlikely to occur in real code.
>>
>> and, of course, if anyone discovers the source of the concatenation
>> bug, I'll be grateful :)
>>
>>
>> PS: the attachment is > 100 K, so you can get it here:
>> http://artax.karlin.mff.cuni.cz/~hajej2am/ulozna/array-new.diff
>>
>>
>> cheers
>>
>> -- RNDr. Jaroslav Hajek
>> computing expert
>> Aeronautical Research and Test Institute (VZLU)
>> Prague, Czech Republic
>> url: www.highegg.matfyz.cz
>
>
> FYI, doing a little extrapolation (my values when running the script took
> ~1/3 as long as yours for all computations) and assuming my speedups would
> be proportionally faster, I compared to how long the same script takes in
> Matlab.  The speedups make it comparable in execution time, whereas before
> it was significantly slower.
>

Great! Unfortunately I can't compare Matlab vs. Octave on the same
machine (I have access to a machine with Matlab, but the disk quota
prevents me from compiling Octave there), so I couldn't test myself.
Could you maybe share the numbers? I'm wondering if there are still
cases (in dense indexing) where we significantly lag behind. For
instance, we could use multiple threads for long operations (once we
settle on what MT tools to adopt for Octave). I'm not sure how much
multithreaded Matlab is, though.

> A big round of applause for someone who took the time to do some dirty work
> on something that already worked, but obviously could have worked better!!!
>  This is awesome news.
>
> John Swensen
>



-- 
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


reply via email to

[Prev in Thread] Current Thread [Next in Thread]