[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: oct file: slower than expected using fortran_vec
From: |
Seb Astien |
Subject: |
Re: oct file: slower than expected using fortran_vec |
Date: |
Mon, 16 May 2011 20:35:11 +0200 |
On Mon, May 16, 2011 at 7:24 PM, John W. Eaton <address@hidden> wrote:
> On 16-May-2011, Seb Astien wrote:
>
> | It is a bit faster the second time, but it does not explain the gap
> | between the two:
> | The bigger the matrix, the bigger the gap between built-in function
> | and the oct one.
> | I suspect a copying taking it place somehwere.
>
> Yes, because when you write
>
> NDArray A = args(0).array_value ();
>
> you grab a second reference to the underlying array data. Then when
> you do
>
> const double *p = A.fortran_vec ();
>
> you are forcing a copy. The const on the LHS is not what determines
> whether or not the const version of Array::fortran_vec is selected
> over the non-const version. That selection is based on whether the
> method is called on a const object.
>
> If you want to avoid the copy, then you should write
>
> const NDArray A = args(0).array_value ();
> const double *p = A.fortran_vec ();
>
> or
>
> NDArray A = args(0).array_value (); // could also declare A to be const
> const double *p = A.data ();
>
> In the latter case, it does not matter whether A is const; no copy is
> ever made with the Array::data method.
>
> jwe
>
Thank you so much. It is what I was missing.
Indeed, now:
octave:1> A=rand(10000,10000);
octave:2> tic; s1=sum(sum(A)); toc
Elapsed time is 0.205881 seconds.
octave:3> tic; s2=sumit(A); toc
Elapsed time is 0.203057 seconds.
So it is faster than the built-in, which I was expecting since the
built-in sums first per columns and then sum the results.
By the way, I had copied some erroneous code, the correct one follows:
----
#include <octave/oct.h>
DEFUN_DLD (sumit, args, , "Sum elements of an array")
{
int nargin = args.length();
if (nargin != 1){
print_usage();
return octave_value_list();
}
const NDArray A = args(0).array_value();
const double *p = A.fortran_vec();
double sum = 0;
octave_idx_type N = A.nelem();
if (! error_state){
for(octave_idx_type i=0; i<N; i++)
sum += *(p++);
return octave_value (sum);
}
}
----
Seb
Re: oct file: slower than expected using fortran_vec, Jordi Gutiérrez Hermoso, 2011/05/16