On Wed, Jan 14, 2009 at 5:00 AM, Nicholas Tung <
address@hidden> wrote:
> On Tue, Jan 13, 2009 at 16:46, Francesco Potortì <
address@hidden>
> wrote:
>>
>> >I've noticed that indexing expressions on large arrays can be quite slow,
>>
>> Have a look at this, which I wrote in November but for some reason does
>> not appear in the list archives. In an application of mine, I wrote an
>> ad hoc function to access a 5-dim matrix using the last method below,
>> getting a significant speedup. I usually run 3.0.1, and I did not make
>> any check to compare its speed with the more recent Octave versions.
>>
>> |Date: 11 Nov 2008 10:44:45 +0100
>> |From: Francesco Potortì <
address@hidden>
>> |To: Jaroslav Hajek <
address@hidden>
>> |CC: Octave help list <
address@hidden>
>> |Subject: Re: slow access to consecutive matrix elements
>> |
>> |>On Sat, Oct 11, 2008 at 10:22 AM, Francesco Potorti`
>> |><
address@hidden> wrote:
>> |>> This is a real life example that demonstrates how access to matrix
>> |>> elements is not optimised for the case of complete ranges in the first
>> |>> dimensions. I am sending it here and not to the bug list because this
>> |>> example may be useful to others, at least until this cases is
>> optimised
>> |>> in the sources.
>> |>>
>> |>> # This is a big 5-dim matrix, about 100MB size
>> |>> octave> kk=rand(156,222,1,44,8);
>> |>>
>> |>> # I access a slice of it, which is sequential in memory
>> |>> octave> t=cputime; for ii=1:44, for jj=1:8
>> |>> mm=kk(:,:,:,ii,jj); endfor, endfor, cputime-t
>> |>> ans = 5.9124
>> |>>
>> |>> # Using a linear index is much faster
>> |>> octave> span=(1:156*222);
>> |>> t=cputime; for ii=1:44, for jj=1:8
>> |>> mm=kk(sub2ind(size(kk),1,1,1,ii,jj)+span-1); endfor, endfor,
>> cputime-t
>> |>> ans = 2.6642
>> |>>
>> |>> # Removing the call to sub2ind reaches top speed
>> |>> octave> cp=[1,cumprod(size(kk)(1:end-1))]; span=(1:156*222);
>> |>> t=cputime; for ii=1:44, for jj=1:8
>> |>> mm=kk(sum(([1,1,1,ii,jj]-1).*cp)+span); endfor, endfor, cputime-t
>> |>> ans = 1.4001
>> |>>
>> |>> Still, I wish access were faster yet. Is there a reason why copying
>> |>> consecutive memory is so slow? I wish I could help with optimising
>> |>> this, even if I am certainly not the most qualified person to do it.
>> |>>
>> |>
>> |>I guess the indexing routines do deserve some attention w.r.t
>> |>performance. Reducing code duplication would also be nice. I have this
>> |>on my TODO list, but I don't think it's a good idea to do it before
>> |>3.2.x is forked, as such changes are, IMO, likely to introduce bugs.
>> |
>> |Follwoing up on this, I realised that there is room for further
>> |significant speedup:
>> |
>> |# Be careful to not convert ranges into matrices
>> |octave> cp=[1,cumprod(size(kk)(1:end-1))]; len=156*222;
>> | t=cputime; for ii=1:44, for jj=1:8
>> |base=sum(([1,1,1,ii,jj]-1).*cp); mm=kk(base+1:base+len); endfor, endfor,
>> cputime-t
>> |ans = 0.26402
>> |
>> |The fact is, I was discounting the fact that a range remains a range
>> |even after linear transformation, while this does not appear to be the
>> |case:
>> |
>> |octave3.1> typeinfo(1:4)
>> |ans = range
>> |octave3.1> typeinfo(4+(1:4))
>> |ans = matrix
>> |octave3.1> typeinfo(4*(1:4))
>> |ans = matrix
>> |
>> |>From the depth of my ignorance about Octave's internals, I dare say that
>> |it should not be too difficult to keep ranges as ranges even after sum
>> |or product with a scalar. Maybe even after sum with a range with the
>> |same number of elements. Am I wrong?
>
> neat, I didn't know that 1-d vector indexing worked on matrices. However, I
> think it may be faster for you because of the multi-dimensional aspect,
> whereas I am working with more values (potentially large image samples). On
> Octave 3.1.51+ (built today) the following code runs in about the same time.
> If you're doing something more complicated, please tell me. I know the two
> snippets below don't compute exactly the same value...
>
> tic, x1 = 1:299; x2 = 2:300; for i = 1:100, a(:, x1) += a(:, x2); endfor,
> toc
> x1 = 1:89700; x2 = 301:90000; tic, for i = 1:100, a(x1) += a(x2); endfor,
> toc
>
> As another note, it doesn't seem to matter that much which way the matrix is
> indexed for the first, i.e. a(x1, :) vs a(:, x1).
>
> Jaroslav - I updated to hg head and it was faster (cut time in half for the
> test example), but on real code not quite as much as I'd hoped.
>
> Nicholas
>