avr-libc-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-libc-dev] Pow Function in avr8


From: Georg-Johann Lay
Subject: Re: [avr-libc-dev] Pow Function in avr8
Date: Fri, 30 Nov 2012 00:06:24 +0100
User-agent: Thunderbird 2.0.0.24 (Windows/20100228)

Thomas, George schrieb:
If you know that the exponent is integral you may want to have a look at
GCC's __builtin_powi* functions.

http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

You asked for a base of 2, maybe ldexp() fits your use case.

Handling integral exponents separately will increase the code size
because the exponent must be checked at run time and extra code must be
executed in that case.

[...]

The builtin which called __powisf2 in libgcc.

The code size and cycles obtained in avrstudio6 were as follows.

Function  Size  Cycles
pow       152     5525
__powisf2 210     452

The sizes are misleading. They just take into account the raw functions but not the sizes of the dependencies like exp, log, division, prologue helper, etc.

Also the code in libc seems to have checks already to check if its
integer so would calling the libgcc function be advisable ?

I am not sure about that.

Most applications need a small code size, but with each special handling that is needed at run time you will drag more dependency functions from the libraries.

And I still wonder if such an optimization is "important":

If the user knows that the exponent is integral he can use powi() in the first place.

If, on the other hand, we don't know anything about the exponent then a reasonable assumption is that it is very unlikely to hit an integral exponent, thus the expected speed gain will be really small because it is very unlikely that the input is an element of some null set...

Let's have a look at the raw sizes.

__powisf2 is open coded in C in libgcc2.c [1], basically

float
__powisf2 (float x, int m)
{
  unsigned int n = m < 0 ? -m : m;
  float y = n % 2 ? x : 1;
  while (n >>= 1)
    {
      x = x * x;
      if (n % 2)
        y = y * x;
    }
  return m < 0 ? 1/y : y;
}

Compiling with 4.6.2 and -mcall-prologues -Os gives the size you mentioned above. With 4.7.2 and also -fno-split-wide-types we see a size of 138 which is 33% less code size. Presumably we see PR52278 [2] in action.

This means there is much room for improvement; an assembler programmer will easily reduce the size below 100 without the need of prologue / epilogue helpers.

log (and thus pow) are using a power series expansion which does not converge really good and has a small radius of convergence of 1, namely the Mercator's series [3].

There are other representations of log that might yield better results like area tangens hyperbolicus or cubic splines.

Maybe someone wants to go through the hassle to implement a better version of pow and do all the implementing and (regression) testing and benchmarking and support and whatever again to speed up the stuff or gain 2 or 3 LSBs...

Johann

--

[1]
http://gcc.gnu.org/viewcvs/trunk/libgcc/libgcc2.c?revision=184997&view=markup#l1744

[2]
http://gcc.gnu.org/PR52278

[3]
http://en.wikipedia.org/wiki/Mercator_series



reply via email to

[Prev in Thread] Current Thread [Next in Thread]