lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Optimized integral power


From: Greg Chicares
Subject: Re: [lmi] Optimized integral power
Date: Thu, 22 Dec 2016 23:17:41 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.4.0

On 2016-12-22 14:45, Vadim Zeitlin wrote:
> On Thu, 22 Dec 2016 02:36:49 +0000 Greg Chicares <address@hidden> wrote:
> 
> GC> Please take a look at commit 569b775d3ffc4afc526a83cb1a769f6d604101a8
> GC> and let me know if you see any reason to keep the std::pow() alternative,
> GC> now in a #if 0 ... #endif block. I'm inclined to remove it soon.
> 
>  I agree, I don't see how std::pow() could do better than what the new code

https://www.sgi.com/tech/stl/power.html#3
| This is in fact not the minimum possible number of multiplications:
| it is possible to compute the fifteenth power of x using only five
| multiplications, but power(x, 15) uses six.

But we needn't worry about that. As an amusing digression, though...

Instead of looking up that example, I was going to say X^9:
  X3 = X*X*X  2 total multiplications
  X9 = X3*X3  3   "         "
whereas with binary powers:
  X2 = X*X    1 total multiplications
  X4 = X2*X2  2 total multiplications
  X8 = X4*X4  3 total multiplications
  X9 = X*X8   4 total multiplications
Am I missing something, or is Matt Austern's example not minimal
as I would have expected it to be, just because that's the way I
thought his mind might work?

Preliminarily, let
  P   = the power to which a number is to be raised
  N   = a number base (3 for ternary, e.g.)
  RN  = the base-N representation of P
  NOD = the number of digits in a particular Rb
  SOD = the sum of the digits in a particular Rb
  QN  = the total number of multiplications required to
        calculate x^P with N-ary power precalculations
For any base-N representation RN, the first digit is free because
it's multiplied by N^0, which is unity. The next digit costs (N-1)
multiplications, and each subsequent digit costs one more. Then the
number of multiplications in a base-N Russian peasant algorithm
(ignoring tiny P for which RN < 2) would be A+B+C, where
  A = N - 1
  B = NOD(RN) - 2
  C = SOD(RN) - 1
because
 - the first digit costs nothing
 - the second digit costs A
 - K digits cost (K-2)=B (we already took care of the first two)
 - and to form the final product we must select SOD(RN) of the
   powers already calculated, and multiplying K numbers take
   (K-1)=C multiplications
Thus
   QN(P,N) = A+B+C, or, more directly,
           = (N - 1) + (NOD(RN) - 2) + (SOD(RN) - 1)
           = N + NOD(RN) + SOD(RN) - 4
  
For P=9:

                       A       B       C   | A   B + C | total
 9 =   1001  binary: (2-1) + (4-2) + (2-1) = 1 + 2 + 1 =   4
 9 =    100 ternary: (3-1) + (3-2) + (1-1) = 2 + 1 + 0 =   3

Testing the simplified formula:

                      N + NOD + SOD - 4 = QN
 9 =   1001  binary:  2 +  4  +  2  - 4 =  4
 9 =    100 ternary:  3 +  3  +  1  - 4 =  3

Let's try 35 to see whether that disproves the formula:

                      N + NOD + SOD - 4 = QN
35 = 100011  binary:  2 +  6 +   3  - 4 = 7
35 =   1022 ternary:  3 +  4 +   5  - 4 = 8
35 =    120 quinary:  5 +  3 +   3  - 4 = 7

...and work that example out as a demonstration:

  X2  = X*X           1 total multiplications
  X4  = X2*X2         2
  X8  = X4*X4         3
  X16 = X8*X8         4
  X32 = X16*X16       5
  X35 = X32*X2*X      7 = Q2(35)

  X3  = X*X*X         2
  X9  = X3*X3         3
  X27 = X9*X9         4
  X35 = X27*X3*X3*X*X 8 = Q3(35)

  X5  = X*X*X*X*X     4
  X25 = X5*X5         5
  X35 = X25*X5*X5     7 = Q5(35)

It seems interesting that the thirty-fifth power can be done no more
efficiently with quinary than with binary grouping of multiplications.
But that's not really startling: after all, the 323rd power can be
calculated more efficiently than by multiplying X by itself 17 times,
and that intermediate value by itself 19 times, because the 17th power
takes less work with binary grouping. That leads to questions like:
what is the lowest exponent P for which Z-ary grouping is better than
for any lower-order grouping--i.e., QZ(P) < QX(P) for all X < Z? and
is there a maximum Z for which that question has an answer? But we'll
postpone such matters for Ugolyok's birthday, which begins within the
hour in the Zulu timezone.

> does (well, in theory some CPU could have a dedicated instruction for
> performing this operation, but in practice I don't know of any such
> instruction).

Like those wonderful REP instructions on the 8086.

>  On a completely unrelated note, I've looked at stl_extensions.hpp while
> reviewing this and noticed copy_n() and iota() functions in it. It's true
> that they were only available in some extensions of C++98 STL, but now they
> are part of C++11 standard, so I think they should be removed now and the
> corresponding std versions should be used instead. This looks especially
> simple to do because each of these functions is only used once (in
> ihs_acctval.cpp for copy_n and in gpt_test.cpp for iota). As always, I'm
> ready to make a patch (or two) if it could be useful.

What I was going to say, until I started wondering why Austern picked
15 as a counterexample above, was that I had hesitated to make this
change for fear it would collide with C++11-ization work that you might
already have done locally, but--concluding that you haven't--I've done
it (including is_sorted(), too), and finally pushed it a moment ago.
It seemed better for me to do it myself because it's quite mechanical
and I can use the vim practice more than you.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]