guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

guile 3 update: more number unboxing improvements


From: Andy Wingo
Subject: guile 3 update: more number unboxing improvements
Date: Wed, 29 Nov 2017 09:14:49 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux)

Hi,

In the last update
(https://lists.gnu.org/archive/html/guile-devel/2017-11/msg00010.html) I
talked a lot about fixnums.  This last couple weeks' work is along the
same lines.

Firstly I added support for specialized instructions that compare (in
the < or = sense) s64 and u64 values against 8-bit immediates.  Then I
started to do some "exploding" of the comparison operators: for example,
if a number is a 64-bit integer, maybe first see if it's a fixnum and
take a fast path there, otherwise call out to a scm->u64 primitive.

I was thinking maybe we would be able to inline even scm->u64 entirely,
but this turned out to be a bad idea, and indeed even the speculative
inlining of the fixnum case of e.g. < wasn't that great.  The problem is
that if you replace a simple primcall (define x FOO) with a diamond
control structure like (define x (if FIXNUM? FOO-1 FOO-2)) early in the
compiler, that makes it harder to do code motion like CSE or LICM on X,
as X now has two definitions.  It could make sense to reify a diamond
control flow more towards the compiler back-end, but not in the
beginning.  Instruction explosion only makes sense if some of the
exploded instructions can be eliminated or are themselves subject to
CSE.  If it's just exploding instructions that can never be eliminated,
probably it's better to call out to the runtime (as in the case of
scm->u64, unless it can be reduced to untag-fixnum).

By this time I had backed myself into a little corner with number
specialization, so I had to back out and fix my mess; took a week or so,
but then I was back on track, with a new idea.  

The problem I was looking to solve was to hoist a fixnum? check above
code like (= x (logand x #xff)).  This predicate will only succeed if X
is a fixnum.  I then realized it would be tricky, as it would involve
hoisting the assertion made inside logand that X is an exact integer
"above" (but to where?), and it could be that the assertion doesn't
commute with other effects so it can't be hoisted.

So I modified the problem :)  Instead I wanted to optimize this kind of
code:

(define (t x)
  (unless (exact-integer? x) (error "expected an integer" x))
  (unless (= x (logand x #xff) (error "out of range" x))
  (* x 2))

or

(define (t x)
  (unless (exact-integer? x) (error "expected an integer" x))
  (unless (<= -10 x 100) (error "out of range" x))
  (* x 2))

Here exact-integer? is effectively (or (fixnum? x) (bignum? x)), so we
have a more precise type of X flowing into the predicate.  However it's
not magical; the "=" in the first example still sees that X could be a
bignum, and likewise for the <= operations.

What you want to do here is to "devirtualize" part of this function.
This is a compiler pass that effectively inlines the concrete
implementations of a virtual operation like =, logand, or <=.  But you
don't want to devirtualize just one operation; you want to devirtualize
a trace of operations.  Like this:

  (define (out-of-range x) (error "out of range" x))
  (define (not-int x) (error "expected an integer" x))
  (cond
   ((fixnum? x)
    (if (<= -10 x 100)
        (* x 2)
        (out-of-range x)))
   ((bignum? x)
    (if (<= -10 x 100)
        (* x 2)
        (out-of-range x)))
   (else
    (not-int x)))

This causes code growth initially, but probably causes shrinkage later;
concretely this code optimizes to:

  (define (out-of-range x) (error "out of range" x))
  (define (not-int x) (error "expected an integer" x))
  (cond
   ((fixnum? x)
    (let ((x* (untag-fixnum x)))
      (if (s64<= -10 x*)
          (if (s64<= x* 100)
              (tag-fixnum (s64+ x* x*))
              (out-of-range x))
          (out-of-range x))))
   ((bignum? x)
    (out-of-range x)
   (else
    (error "expected an integer" x)))

which is indeed what we get in code:

  Disassembly of #<procedure t (x)> at #xb437a0:

     0    (assert-nargs-ee/locals 2 0)    ;; 2 slots (1 arg)    at (unknown 
file):10:0
     1    (immediate-tag=? 0 3 2)         ;; fixnum?            at (unknown 
file):11:10
     3    (jne 11)                        ;; -> L1
     4    (untag-fixnum 1 0)                                    at (unknown 
file):12:10
     5    (s64-imm<? 1 -10)               
     6    (jl 14)                         ;; -> L2
     7    (imm-s64<? 1 100)               
     8    (jl 12)                         ;; -> L2
     9    (mov 0 1)                                             at (unknown 
file):13:2
    10    (uadd 1 0 1)                    
    11    (tag-fixnum 0 1)                
    12    (handle-interrupts)             
    13    (return-values 2)               ;; 1 value
  L1:
    14    (immediate-tag=? 0 7 0)         ;; heap-object?       at (unknown 
file):11:10
    16    (jne 8)                         ;; -> L3
    17    (heap-tag=? 0 4095 279)         ;; bignum?
    19    (jne 5)                         ;; -> L3
  L2:
    20    (throw/value 0 138)             ;; #(misc-error #f "out of range ~S") 
at (unknown file):12:25
    22    (handle-interrupts)             
    23    (return-values 1)               ;; 0 values
  L3:
    24    (throw/value 0 150)             ;; #(misc-error #f "expected an 
integer ~S") at (unknown file):11:29
    26    (handle-interrupts)             
    27    (return-values 1)               ;; 0 values

Pretty good code I think, though I should probably improve the
disassembler to make more readable disassemblies.  I would be interested
if anyone knows if other compilers do this.  It would appear that GCC
and LLVM do not:

  
https://www.godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAKxAEZSBnVAV2OUxAHIBSAJgGY8AO2QAbZlgDU3fgGEGBfEIIA6BDOzcADAEFtOhcWbICkgEZ5gQ5gFsZAIX37mQvKiGThBTMEzFpAOyOupKhYcxeAA4ExAD6pgBmeAAe1nb8wTph2ZKGxqYWVraSAFSFaQ5OAQAilbrOru6eyj5%2B5pZpsQCG6Oi0khAubh5ercQAlHU6Q02jvv5RMfHxqLHlxRCLcQSTGU66Cl0EeMjNosKYkhHKABzxngyxSakbMyMt856T9UH6OZLETAEVgeTYqZ5paS8ABsklo42k/FqSLhU24NX2BgIRxOZwuV0a728ny2yyITxSkM2ymi20kyW%2Bel%2BIRyb2axLagNMMmqgXsknBlOKPIGUGuBFp8QRDMRshksjhCPRCv66ORmX%2BoS5IIBQLRNVCmMOx1OwnOQkubLmbVJBBWsWt/mpEqWpgZVQ1OTwCVFYtt0qV/E0Qck0N4Sphir%2Bmq1QJ1tvt6xsA3dexZMdCmFEDEw0YzOW1xCJkrt5IhG1TmTVRuxJrxFvZJck5Zs9ttAytHxtjPRnuyhdBzpLCLwgpeNkDwewqLTTNq9QOtdxZvxnY5/hb3V6/UGhPZYy%2BHrzYQA9CfJAB1S4AayEqAA7pIekpgHDJPf3GBOKZUAA3PwJKID6SAQCCXLaKjHv2cZFo2rptl2TqbuS7abBG/Lwvq856IuOKmkI5qWnujpPtuHbEYhh4/H2YQDg8FLjgMeCBgAYs2Qqts%2BO7MZIIDtEUnFkWhWGYmuB7PgALORwz7iSPbMlkBYwR4XEDKpEDqepzHjDpWGcOMpCiFwACsnCkEIXBaGZqBcHKvCOPZuQsGwlx8PwtBmQQln6QZ14gBJACcKgBSFoVhWFhlcBJZkWZwVmkDZnBmQwIBaKQXlxfppBwLASBoDYkR4KIfjkJQ%2BWFcVxAoKIXRCMAxlaGlSSiN4xApRAZjeaQFhCF0xAAJ5cB5pD5TYmDKAA8gRg2ZaQWA2LVwDFV1%2BCAiYeD/ils2YMkmDIMw3hDWZoxGbNMR4DY3kGecZgpZABmoNEwxbQAtBN/DJc57B0NdJkxV1iXJDc0IvdCUliItkjGSoWgwwMuCECQULuaQkiyKgBVFW0bnwp5V2%2Bf5QXhcToWRZw0XmQDXDJal6X42TvD/bNiV45lOmkP%2BbXDP5QA

I.e. each individual fixnum op is inlined, but GCC and LLVM don't peel
out a trace for the fixnum case.  I call this pass "integer
devirtualization" but also use the term "peeling a trace".  Words!

So that's my status.  Next up, I need to fix <= and >= on NaN values, as
I didn't get to that from last time; then I will move on to instruction
explosion of vector-ref and related instructions.  But first, I promised
Ludovic I would fix the compiler speed issue in slot allocation, both
for 2.2 and 3.0, so I will do that :)

Cheers,

Andy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]