help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: oct file: slower than expected using fortran_vec


From: Seb Astien
Subject: Re: oct file: slower than expected using fortran_vec
Date: Mon, 16 May 2011 20:35:11 +0200

On Mon, May 16, 2011 at 7:24 PM, John W. Eaton <address@hidden> wrote:
> On 16-May-2011, Seb Astien wrote:
>
> | It is a bit faster the second time, but it does not explain the gap
> | between the two:
> | The bigger the matrix, the bigger the gap between built-in function
> | and the oct one.
> | I suspect a copying taking it place somehwere.
>
> Yes, because when you write
>
>  NDArray A = args(0).array_value ();
>
> you grab a second reference to the underlying array data.  Then when
> you do
>
>  const double *p = A.fortran_vec ();
>
> you are forcing a copy.  The const on the LHS is not what determines
> whether or not the const version of Array::fortran_vec is selected
> over the non-const version.  That selection is based on whether the
> method is called on a const object.
>
> If you want to avoid the copy, then you should write
>
>  const NDArray A = args(0).array_value ();
>  const double *p = A.fortran_vec ();
>
> or
>
>  NDArray A = args(0).array_value ();  // could also declare A to be const
>  const double *p = A.data ();
>
> In the latter case, it does not matter whether A is const; no copy is
> ever made with the Array::data method.
>
> jwe
>

Thank you so much. It is what I was missing.
Indeed, now:

octave:1> A=rand(10000,10000);
octave:2> tic; s1=sum(sum(A)); toc
Elapsed time is 0.205881 seconds.
octave:3> tic; s2=sumit(A); toc
Elapsed time is 0.203057 seconds.

So it is faster than the built-in, which I was expecting since the
built-in sums first per columns and then sum the results.

By the way, I had copied some erroneous code, the correct one follows:

----
#include <octave/oct.h>

DEFUN_DLD (sumit, args, , "Sum elements of an array")
{
        int nargin = args.length();
        
        if (nargin != 1){
                print_usage();
                return octave_value_list();
        }
        
        const NDArray A = args(0).array_value();
        const double *p = A.fortran_vec();
        double sum = 0;
        octave_idx_type N = A.nelem();
        
        if (! error_state){
                for(octave_idx_type i=0; i<N; i++)
                        sum += *(p++);
                return octave_value (sum);
        }
}
----

Seb


reply via email to

[Prev in Thread] Current Thread [Next in Thread]