help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

execution speed in *oct files


From: John W. Eaton
Subject: execution speed in *oct files
Date: Mon, 7 Jun 1999 20:34:35 -0500 (CDT)

On  4-Jun-1999, John W. Eaton <address@hidden> wrote:

| On  3-Jun-1999, A+A Adler <address@hidden> wrote:
| 
| | I just submitted an oct file for 2-D convolutions
| | to octave-sources.
| | 
| | While I was writing it, I noticed that the standard octave
| | syntax seems to be very inefficient.

[code examples elided]

| |            Why are pointers so much more efficient than
| |              matrix refernces?
| 
| Because the indexing operators invoke some functions to do the
| indexing calculation.  I'd guess that either the functions are not all
| inlined, or the compiler is not able to convert the addition and
| multiplication of the indexing operation to a simple increment (as you
| have), or both.

I've done a few quick tests using the following code.  It simply adds
two real matrices and returns the result.  The first two arguments are
the matrices to add.  The third argument controls the code block used
to perform the additions, and the last argument says how many times to
perform the addition (to ensure that it takes long enough to get
meaningful timing results).

  #include <octave/oct.h>

  DEFUN_DLD (foo, args, ,
    "")
  {
    octave_value_list retval;

    int nargin = args.length ();

    int n = (nargin > 2)
      ? static_cast<int> (args(2).double_value ()) : 0;

    int max_try = (nargin == 4)
      ? static_cast<int> (args(3).double_value ()) : 5;

    if (nargin < 2)
      {
        usage ("foo (a, b [, case [, max_try]])");
        return retval;
      }

    switch (n)
      {
      case 0:
        {
          // Use Matrix::operator + (const Matrix&, const Matrix&)

          Matrix a = args(0).matrix_value ();
          Matrix b = args(1).matrix_value ();

          Matrix r;

          for (int k = 0; k < max_try; k++)
            r = a + b;

          retval(0) = r;
        }
        break;

      case 1:
        {
          // Use loops with Matrix::operator () (int, int) for indexing.

          Matrix a = args(0).matrix_value ();
          Matrix b = args(1).matrix_value ();

          int n = a.rows ();
          int m = a.cols ();

          Matrix r (n, m);

          for (int k = 0; k < max_try; k++)
            for (int j = 0; j < m; j++)
              for (int i = 0; i < n; i++)
                r(i,j) = a(i,j) + b(i,j);

          retval(0) = r;
        }
        break;

      case 2:
        {
          // Use loops with Matrix::operator () (int, int) const for
          // indexing elements on RHS of assignment operator and
          // non-const indexing operator on LHS.

          const Matrix a = args(0).matrix_value ();
          const Matrix b = args(1).matrix_value ();

          int n = a.rows ();
          int m = a.cols ();

          Matrix r (n, m);

          for (int k = 0; k < max_try; k++)
            for (int j = 0; j < m; j++)
              for (int i = 0; i < n; i++)
                r(i,j) = a(i,j) + b(i,j);

          retval(0) = r;
        }
        break;

      case 3:
        {
          // Use loops with Matrix::operator () (int, int) const for
          // indexing elements on RHS of assignment operator and
          // built-in pointer for accessing elements on LHS.

          const Matrix a = args(0).matrix_value ();
          const Matrix b = args(1).matrix_value ();

          int n = a.rows ();
          int m = a.cols ();

          Matrix r (n, m);

          for (int k = 0; k < max_try; k++)
            {
              double *pr = r.fortran_vec ();

              for (int j = 0; j < m; j++)
                for (int i = 0; i < n; i++)
                  *pr++ = a(i,j) + b(i,j);
            }

          retval(0) = r;
        }
        break;

      case 4:
        {
          // Use pointers all around, and one loop instead of two.

          const Matrix a = args(0).matrix_value ();
          const Matrix b = args(1).matrix_value ();

          int n = a.rows ();
          int m = a.cols ();

          Matrix r (n, m);

          int nm = n * m;

          for (int k = 0; k < max_try; k++)
            {
              const double *pa = a.data ();
              const double *pb = b.data ();

              double *pr = r.fortran_vec ();

              for (int i = 0; i < nm; i++)
                *pr++ = *pa++ + *pb++;
            }

          retval(0) = r;
        }
        break;

      default:
        // Do nothing.  Call this case to see whether overhead for
        // the function call is significant.
        break;
      }

    return retval;
  }

Here are the results on my system (a 266 MHz Pentium II running Linux
and using egcs 1.1.2).

  $ octave
  GNU Octave, version 2.0.14 (i686-pc-linux-gnu).
  Copyright (C) 1996, 1997, 1998, 1999 John W. Eaton.
  This is free software with ABSOLUTELY NO WARRANTY.
  For details, type `warranty'.

  octave:1> foo
  usage: foo (a, b [, case [, max_try]])
  octave:1> a = rand (1000); b = rand(1000);
  octave:2> for i = -1:4 t = cputime(); foo (a, b, i, 10); cputime() - t endfor
  ans = 0        # do nothing
  ans = 2.3700   # operator +
  ans = 3.0800   # non-const operator () on RHS and LHS
  ans = 2.4500   # const operator () on RHS, non-const on LHS
  ans = 2.3000   # const operator () on RHS, pointer on LHS
  ans = 2.2600   # pointers all around

So, if you use const where appropriate, then I think the performance
should be about the same whether you use the indexing operators or
pointers.  Even without const (which causes the operators to have to
do a bit more work to decide whether the data needs to be copied
prior to returning a reference), I don't think you should see a speed
up of 3 times just by switching to using pointers.

Perhaps the problem is some missed optimization.  What compiler and
options are you using?

Just to see what would happen, I tried using -fno-strength-reduce (you
don't say how your copy of Octave was compiled, but I seem to recall
that Dirk Eddelbuettel compiles the Debian distributions with this
flag).  Here are the results I see when using -fno-strength-reduce to
compile the .oct file:

  octave:1> foo
  usage: foo (a, b [, case [, max_try]])
  octave:1> a = rand (1000); b = rand(1000);
  octave:2> for i = -1:4 t = cputime(); foo (a, b, i, 10); cputime() - t endfor
  ans = 0        # do nothing
  ans = 2.3300   # operator +                              
  ans = 6.0800   # non-const operator () on RHS and LHS    
  ans = 5.5200   # const operator () on RHS, non-const on LHS
  ans = 3.9400   # const operator () on RHS, pointer on LHS
  ans = 2.3900   # pointers all around

Ouch.  Strength reduction really helps here.  Unless we resort to
writing low-level code that is very difficult to maintain (not my
first choice), we are going to have to have good optimizing compilers,
or at least use the optimzations that are available to us.

jwe



---------------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.  To ensure
that development continues, see www.che.wisc.edu/octave/giftform.html
Instructions for unsubscribing: www.che.wisc.edu/octave/archive.html
---------------------------------------------------------------------



reply via email to

[Prev in Thread] Current Thread [Next in Thread]