help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: oct file: slower than expected using fortran_vec


From: Jordi Gutiérrez Hermoso
Subject: Re: oct file: slower than expected using fortran_vec
Date: Mon, 16 May 2011 12:22:13 -0500

Please respond publicly (hit "reply all" instead of "reply").

2011/5/16 Seb Astien <address@hidden>:
> 2011/5/16 Jordi Gutiérrez Hermoso <address@hidden>:
>> On 15 May 2011 15:10, Seb Astien <address@hidden> wrote:
>>> Hi,
>>>
>>> I have been experimenting a bit with oct files to see how much faster
>>> it would be.
>>> I wrote a trivial example to sum the elements of an array.
>>> To my surprised, this was still much slower than built-in functions:
>>
>> Of course, your C++ code isn't vectorised. Delegate the sum to the
>> BLAS instead which will use vector instructions. Sometimes gcc can
>> vectorise, depending on compiler flags (which ones did you use?), but
>> not always.
>>

>
> Thank you for your answer.
> To me, the whole point of writing and oct extension in the first place
> is when the code can not be vectorised!
> In such case, then octave becomes really slow.

Vectorisation isn't a technique limited to the Octave language. Yes,
we have to vectorise Octave code because the interpreter is slow, but
even if some day we improve it and implement JIT for Octave,
vectorised code will still be faster, just as it's faster in your
example without invoking the Octave interpreter.

A library that implements BLAS will typically translate instructions
such as this sum to *hardware* vector instructions (on Intel
processors this for example includes the SSE family of instructions),
and that's why a library that produces the appropriate assembly
opcodes will still be faster if it's vectorised. The holy grail of
vectorisation is to do this automatically, which gcc sometimes
attempts, but there are aliasing subtleties that sometimes make it
difficult.

Consult for example this Wikipedia page, and note how it's not talking
about Matlab or Octave:

     http://en.wikipedia.org/wiki/Vectorization_%28parallel_computing%29

> My worries is that somehow, the arguments are be fully copied. Could
> someone confirm it and tell me how to avoid it?

No, it's not about copying, it's that your example isn't being
translated into vector instructions by gcc. There are some compiler
flags that could help, for example, try the
-ftree-vectorizer-verbose=1 option to get some diagnostic if gcc was
able to vectorise your loop or not.

HTH,
- Jordi G. H.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]