gcl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gcl-devel] Re: Improvement to GCL's EXPT?


From: Camm Maguire
Subject: [Gcl-devel] Re: Improvement to GCL's EXPT?
Date: 05 Apr 2005 17:50:10 -0400
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2

Greetings!

Well, the ultimate way is in this small change to CVS head (aka 2.7.0
-- don't use this yet until release please :-)):

>(in-package 'compiler)

COMPILER>(defun expt-2-to-ash (form env)
 (declare (ignore env))
 (if (eql (cadr form) 2) `(ash 1 ,(caddr form)) form))

EXPT-2-TO-ASH

COMPILER>(si::putprop 'expt (function expt-2-to-ash) 'compiler-macro)

(LAMBDA-BLOCK EXPT-2-TO-ASH (FORM ENV)
  (DECLARE (IGNORE ENV))
  (IF (EQL (CADR FORM) 2) '(ASH 1 (CADDR FORM)) FORM))

COMPILER>(defun foo (x) (declare ((integer 0 31) x))  (expt 2 x))

FOO

COMPILER>(disassemble 'foo)

...

static void L1()
{register object *base=vs_base;
        register object *sup=base+VM1; VC1
        vs_check;
        {long V1;
        V1=fix(base[0]);
        vs_top=sup;
        goto TTL;
TTL:;
        V2 = V1;
        base[1]= CMPmake_fixnum((long)(((fixnum)1)<<(V2)));
        vs_top=(vs_base=base+1)+1;
        return;
        }
}


Note that this will only kick in if x is known to be less than
(integer-length most-positive-fixnum) (i.e. nfix has to be declared
appropriately), as the following entries in gcl_cmpopt.lsp indicate:


(si::putprop 'ash 'ash-propagator 'type-propagator)
(push '(((integer 0 0) t) fixnum #.(flags rfa)"0")
   (get 'ash 'inline-always))
(push '((fixnum (integer 0 #.(integer-length most-positive-fixnum))) fixnum 
#.(flags)"((#0)<<(#1))")
   (get 'ash 'inline-always))
(push '((fixnum (integer #.most-negative-fixnum -1)) fixnum #.(flags set)
        #.(concatenate 'string "@1;(-(#1)&"
                       (write-to-string (lognot (integer-length 
most-positive-fixnum)))
                       "? ((#0)>=0 ? 0 : -1) : (#0)>>-(#1))"))
   (get 'ash 'inline-always))


We've done some pretty heavy work on optimization in 2.7.0, and still
aren't done.

Of course, the boxed version can also check for 2 as an argument and
simplify.  Really the most important in here is to fixnum divide
instead of integer_divide1 when possible.

I'll play with these a bit more and commit to cvs head hopefully
soon.  If it is really important, please let me know and I'll try to
see how to accomplish the same in 2.6.6.

There may also be a gmp shift we can bring forward.

Take care,


Matt Kaufmann <address@hidden> writes:

> Hi, David --
> 
> First, let me say that such a GCL mod would be welcome!  But here's an 
> approach
> that doesn't rely on it, based on (as you already observed) the idea of
> shifting.
> 
> I recently faced a similar issue when dealing with books/rtl/rel4/.  There,
> specifically in books/rtl/rel4/lib/rtl.lisp, functions bits and bitn were
> defined as follows:
> 
> (defund bits (x i j)
>   (declare (xargs :guard (rationalp x)))
>   (if (or (not (integerp i))
>           (not (integerp j)))
>       0
>     (fl (/ (mod x (expt 2 (1+ i))) (expt 2 j)))))
> 
> (defund bitn (x n)
>     (declare (xargs :guard (rationalp x)))
>   (bits x n n))
> 
> I have recently changed these to the following (this is planned for inclusion
> in ACL2 Version 2.9.2):
> 
> (defund bits (x i j)
>   (declare (xargs :guard (and (natp x)
>                               (natp i)
>                               (natp j)
>                               (<= j i))))
>   (mbe :logic (if (or (not (integerp i))
>                       (not (integerp j)))
>                   0
>                 (fl (/ (mod x (expt 2 (1+ i))) (expt 2 j))))
>        :exec  (logand (ash x (- j)) (1- (ash 1 (1+ (- i j)))))))
> 
> (defund bitn (x n)
>   (declare (xargs :guard (and (natp x)
>                               (natp n))))
>   (mbe :logic (bits x n n)
>        :exec  (if (evenp (ash x (- n))) 0 1)))
> 
> Notice that the :logic definitions are just the previous bodies, but the :exec
> functions avoid the slow operations.  I did very minimal testing and got
> speedups of something like 13x for bits and 20x for bitn (as I recall).
> 
> I don't know how familiar you are with mbe, but a number of Rockwell folks 
> have
> used it so I'm guessing you are.  (If not: it's documented, and also I'd be
> happy to explain if you have questions.)  A really cool thing about the 
> changes
> above is that I didn't need to make any changes to existing proofs (and there
> were already a LOT of proofs!).  That's just how mbe is supposed to work.  The
> downside is that I had to prove that in each case, the :logic and :exec
> definitions agreed, which was a bit of work.
> 
> In the short term, you could make the appropriate changes in your copy of the
> ihs books, use :verify-guards nil, and then (skip-proofs (verify-guards ...)).
> I'm guessing you know what I mean, but if that's not perfectly clear, let me
> know and I'll be more clear.  In the longer term, it would be a nice
> improvement to the ihs books for you to do the proofs and donate the
> improvement to the ACL2 community.
> 
> (Speaking of which, some time ago I think I had email with Matt Wilding about
> contributing books... but I haven't seen it yet.  AMD has been good about
> making such work available for years now, so I hope you guys will contribute
> some of your excellent work that is general-purpose, like ihs library
> improvements if those come -- obviously that's not expected for stuff related
> to your processors).
> 
> Oh, and as a hack, you could cheat and redefine expt2 in raw Lisp, or using
> :redef in the ACL2 loop.
> 
> -- Matt
>    From: <address@hidden>
>    Date: Tue, 5 Apr 2005 14:33:06 -0500
>    X-MIMETrack: Serialize by Router on 
> CollinsCRSMTP02/CedarRapids/RockwellCollins(Release
>     6.5.3|September 14, 2004) at 04/05/2005 02:33:09 PM,
>          Serialize complete at 04/05/2005 02:33:09 PM
>    Content-Type: multipart/alternative; boundary="=_alternative 
> 006B66CA86256FDA_="
>    X-SpamAssassin-Status: No, hits=-2.1 required=5.0
>    X-UTCS-Spam-Status: No, hits=-252 required=180
> 
>    This is a multipart message in MIME format.
>    --=_alternative 006B66CA86256FDA_=
>    Content-Type: text/plain; charset="US-ASCII"
> 
>    Gents,
> 
>    I've been pawing through some profiling output derived from running sample 
>    machine code on a processor model implemented in ACL2.  My near-term goal 
>    is to reduce some of the "hotspots" that gprof has identified. 
>    Unsurprisingly, a number of the hostspots have to do with operations that 
>    lead to alloc_object() (and then inevitably on to gc).  By way of 
>    background: the processor model in question has taken a number of steps to 
>    eliminate "bignum" operations; however, over time, a general desire for 
>    elegance, and frankly, developer laziness, has allowed a number of these 
>    bad boys to sneak back in.  Part of my near-term job is to eliminate the 
>    need for bignums once again; however, along the way I thought I'd document 
>    opportunities that profiling is identifying for ACL2/GCL to execute more 
>    efficiently, even when bignums are used (which is to say, most of the 
>    time).
> 
>    I was surprised when I saw that a majority of the calls to alloc_object() 
>    were due to integer_quotient_remainder_1(), i.e.
> 
>    index % time    self  children    called     name
>                  0.00    0.00       2/1931716     Llist [165]
>                  0.00    0.44   90636/1931716     make_bignum [63]
>                  0.00    1.18  242556/1931716     make_fixnum1 [37]
>                  0.01    7.77 1598522/1931716 integer_quotient_remainder_1 
>    [7]
>    [2]     86.8    0.01    9.39 1931716         alloc_object [2]
> 
>    Basically, this gprof output is saying that 1,598,522 of the 1,931,716 
>    calls to alloc_object() are coming from integer_quotient_remainder_1(). 
>    Also it shows that 86.8% of the time is spent in alloc_object() or its 
>    descendents, which includes all the garbage collection activity (thus the 
>    high percentage).  As I said, this was surprising, since I'm not doing 
>    much division in my model.  Well, at least not on purpose:
> 
>                  0.00    0.00       4/799261      Ltruncate [161]
>                  0.00    0.88   88188/799261      Lfloor [47]
>                  0.00    7.09  711069/799261      integer_divide1 [8]
>    [7]     73.6    0.00    7.97  799261         integer_quotient_remainder_1 
>    [7]
> 
>    Which, in turn, leads (mostly) to 
> 
>                  0.00    0.14   13776/711069      make_ratio [94]
>                  0.01    6.95  697293/711069      number_expt [6]
>    [8]     65.5    0.01    7.09  711069         integer_divide1 [8]
> 
>    Which, again, following the function that's doing the most calls, leads us 
>    up the call stack to 
> 
>                  0.03    8.17  202219/202219 
>    LI6__EXPT2__books_ihs_logops_definitions [5]
>    [6]     75.7    0.03    8.17  202219         number_expt [6]
> 
>    Ah ha!  Finally a Lisp (and ACL2) function!  EXPT2 is a simple function in 
>    the IHS (Integer Hardware Specification) book, which just appeals to 
>    Lisp's EXPT, as follows:
> 
> 
>    (defun expt2 (n)
>      (declare (xargs :guard (and (integerp n)
>                                (<= 0 n))))
>      (expt 2 (nfix n)))
> 
> 
>    BTW, The EXPT2 function is used all over the place in IHS, e.g. in 
>    LOGMASK, LOGMASKP, LOGHEAD, LOGTAIL, LOGAPP, and LOGSAT.  Granted, a shift 
>    could have been used instead, but that's not how IHS was designed, and I 
>    am very hesitant to try to change it willy-nilly.
> 
>    So, apparently each call of (expt 2 ...) causes several calls of 
>    integer_divide1(), which in turns causes a call of 
>    integer_quotient_remainder_1(), which in turn causes several calls of 
>    alloc_object().  So, by my crude math, every call of (expt 2 ...) causes 
>    ~8 calls of alloc_object(), not to mention the overhead of the 
>    intermediate calls. 
> 
>    Thus, based on my data, I suggest that the GCL folks take a look at the 
>    implementation of EXPT, and introduce a "short-circuit" case for (EXPT 2 
>    n), for integer n >= 0.  I do not consider myself competent to do so; 
>    besides, I am focusing my efforts on eliminating the calls that lead to 
>    the (EXPT 2 ...) in the first place.
> 
> 
>    Thank you for your consideration,
> 
>    David Hardin
> 
>    P.S.  Please let me know if this sort of analysis is 
>    unhelpful/obscure/incorrect/etc., and I will bother you no more with it.
> 
>    --=_alternative 006B66CA86256FDA_=
>    Content-Type: text/html; charset="US-ASCII"
> 
> 
>    <br><font size=2 face="Courier New">Gents,</font>
>    <br>
>    <br><font size=2 face="Courier New">I've been pawing through some profiling
>    output derived from running sample machine code on a processor model 
> implemented
>    in ACL2. &nbsp;My near-term goal is to reduce some of the 
> &quot;hotspots&quot;
>    that gprof has identified. &nbsp;Unsurprisingly, a number of the hostspots
>    have to do with operations that lead to alloc_object() (and then inevitably
>    on to gc). &nbsp;By way of background: the processor model in question
>    has taken a number of steps to eliminate &quot;bignum&quot; operations;
>    however, over time, a general desire for elegance, and frankly, developer
>    laziness, has allowed a number of these bad boys to sneak back in. 
> &nbsp;Part
>    of my near-term job is to eliminate the need for bignums once again; 
> however,
>    along the way I thought I'd document opportunities that profiling is 
> identifying
>    for ACL2/GCL to execute more efficiently, even when bignums are used (which
>    is to say, most of the time).</font>
>    <br>
>    <br><font size=2 face="Courier New">I was surprised when I saw that a 
> majority
>    of the calls to alloc_object() were due to integer_quotient_remainder_1(),
>    i.e.<br>
>    </font>
>    <br><font size=2 face="Courier New">index % time &nbsp; &nbsp;self 
> &nbsp;children
>    &nbsp; &nbsp;called &nbsp; &nbsp; name</font>
>    <br><font size=2 face="Courier New">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>    &nbsp; &nbsp; &nbsp; 0.00 &nbsp; &nbsp;0.00 &nbsp; &nbsp; &nbsp; 2/1931716
>    &nbsp; &nbsp; Llist [165]</font>
>    <br><font size=2 face="Courier New">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>    &nbsp; &nbsp; &nbsp; 0.00 &nbsp; &nbsp;0.44 &nbsp; 90636/1931716 &nbsp;
>    &nbsp; make_bignum [63]</font>
>    <br><font size=2 face="Courier New">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>    &nbsp; &nbsp; &nbsp; 0.00 &nbsp; &nbsp;1.18 &nbsp;242556/1931716 &nbsp;
>    &nbsp; make_fixnum1 [37]</font>
>    <br><font size=2 face="Courier New">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>    &nbsp; &nbsp; &nbsp; 0.01 &nbsp; &nbsp;7.77 1598522/1931716 &nbsp; &nbsp;
>    integer_quotient_remainder_1 [7]</font>
>    <br><font size=2 face="Courier New">[2] &nbsp; &nbsp; 86.8 &nbsp; 
> &nbsp;0.01
>    &nbsp; &nbsp;9.39 1931716 &nbsp; &nbsp; &nbsp; &nbsp; alloc_object 
> [2]</font>
>    <br>
>    <br><font size=2 face="Courier New">Basically, this gprof output is saying
>    that 1,598,522 of the 1,931,716 calls to alloc_object() are coming from
>    integer_quotient_remainder_1(). &nbsp;Also it shows that 86.8% of the time
>    is spent in alloc_object() or its descendents, which includes all the 
> garbage
>    collection activity (thus the high percentage). &nbsp;As I said, this was
>    surprising, since I'm not doing much division in my model. &nbsp;Well,
>    at least not on purpose:</font>
>    <br>
>    <br><font size=2 face="Courier New">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>    &nbsp; &nbsp; &nbsp; 0.00 &nbsp; &nbsp;0.00 &nbsp; &nbsp; &nbsp; 4/799261
>    &nbsp; &nbsp; &nbsp;Ltruncate [161]</font>
>    <br><font size=2 face="Courier New">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>    &nbsp; &nbsp; &nbsp; 0.00 &nbsp; &nbsp;0.88 &nbsp; 88188/799261 &nbsp;
>    &nbsp; &nbsp;Lfloor [47]</font>
>    <br><font size=2 face="Courier New">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>    &nbsp; &nbsp; &nbsp; 0.00 &nbsp; &nbsp;7.09 &nbsp;711069/799261 &nbsp;
>    &nbsp; &nbsp;integer_divide1 [8]</font>
>    <br><font size=2 face="Courier New">[7] &nbsp; &nbsp; 73.6 &nbsp; 
> &nbsp;0.00
>    &nbsp; &nbsp;7.97 &nbsp;799261 &nbsp; &nbsp; &nbsp; &nbsp; 
> integer_quotient_remainder_1
>    [7]</font>
>    <br>
>    <br><font size=2 face="Courier New">Which, in turn, leads (mostly) to 
> </font>
>    <br>
>    <br><font size=2 face="Courier New">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>    &nbsp; &nbsp; &nbsp; 0.00 &nbsp; &nbsp;0.14 &nbsp; 13776/711069 &nbsp;
>    &nbsp; &nbsp;make_ratio [94]</font>
>    <br><font size=2 face="Courier New">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>    &nbsp; &nbsp; &nbsp; 0.01 &nbsp; &nbsp;6.95 &nbsp;697293/711069 &nbsp;
>    &nbsp; &nbsp;number_expt [6]</font>
>    <br><font size=2 face="Courier New">[8] &nbsp; &nbsp; 65.5 &nbsp; 
> &nbsp;0.01
>    &nbsp; &nbsp;7.09 &nbsp;711069 &nbsp; &nbsp; &nbsp; &nbsp; integer_divide1
>    [8]</font>
>    <br>
>    <br><font size=2 face="Courier New">Which, again, following the function
>    that's doing the most calls, leads us up the call stack to </font>
>    <br>
>    <br><font size=2 face="Courier New">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>    &nbsp; &nbsp; &nbsp; 0.03 &nbsp; &nbsp;8.17 &nbsp;202219/202219 &nbsp;
>    &nbsp; &nbsp;LI6__EXPT2__books_ihs_logops_definitions [5]</font>
>    <br><font size=2 face="Courier New">[6] &nbsp; &nbsp; 75.7 &nbsp; 
> &nbsp;0.03
>    &nbsp; &nbsp;8.17 &nbsp;202219 &nbsp; &nbsp; &nbsp; &nbsp; number_expt
>    [6]</font>
>    <br>
>    <br><font size=2 face="Courier New">Ah ha! &nbsp;Finally a Lisp (and ACL2)
>    function! &nbsp;EXPT2 is a simple function in the IHS (Integer Hardware
>    Specification) book, which just appeals to Lisp's EXPT, as follows:</font>
>    <br>
>    <br>
>    <br><font size=2 face="Courier New">(defun expt2 (n)</font>
>    <br><font size=2 face="Courier New">&nbsp; (declare (xargs :guard (and
>    (integerp n)</font>
>    <br><font size=2 face="Courier New">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>    &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
> (&lt;=
>    0 n))))</font>
>    <br><font size=2 face="Courier New">&nbsp; (expt 2 (nfix n)))</font>
>    <br>
>    <br>
>    <br><font size=2 face="Courier New">BTW, The EXPT2 function is used all
>    over the place in IHS, e.g. in LOGMASK, LOGMASKP, LOGHEAD, LOGTAIL, LOGAPP,
>    and LOGSAT. &nbsp;Granted, a shift could have been used instead, but that's
>    not how IHS was designed, and I am very hesitant to try to change it 
> willy-nilly.</font>
>    <br>
>    <br><font size=2 face="Courier New">So, apparently each call of (expt 2
>    ...) causes several calls of integer_divide1(), which in turns causes a
>    call of integer_quotient_remainder_1(), which in turn causes several calls
>    of alloc_object(). &nbsp;So, by my crude math, every call of (expt 2 ...)
>    causes ~8 calls of alloc_object(), not to mention the overhead of the 
> intermediate
>    calls. &nbsp;</font>
>    <br>
>    <br><font size=2 face="Courier New">Thus, based on my data, I suggest that
>    the GCL folks take a look at the implementation of EXPT, and introduce
>    a &quot;short-circuit&quot; case for (EXPT 2 n), for integer n &gt;= 0.
>    &nbsp;I do not consider myself competent to do so; besides, I am focusing
>    my efforts on eliminating the calls that lead to the (EXPT 2 ...) in the
>    first place.</font>
>    <br>
>    <br>
>    <br><font size=2 face="Courier New">Thank you for your 
> consideration,</font>
>    <br>
>    <br><font size=2 face="Courier New">David Hardin</font>
>    <br>
>    <br><font size=2 face="Courier New">P.S. &nbsp;Please let me know if this
>    sort of analysis is unhelpful/obscure/incorrect/etc., and I will bother
>    you no more with it.</font>
>    <br>
>    --=_alternative 006B66CA86256FDA_=--
> 
> 
> 

-- 
Camm Maguire                                            address@hidden
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah




reply via email to

[Prev in Thread] Current Thread [Next in Thread]