guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

gmp issues (long)


From: Rob Browning
Subject: gmp issues (long)
Date: Mon, 24 Feb 2003 16:29:04 -0600
User-agent: Gnus/5.090008 (Oort Gnus v0.08) Emacs/21.2 (i386-pc-linux-gnu)

I have a version of guile working now using gmp for bignums, and now
that I can test it, there are some issues worth discussing.

Now that I've fixed some bugs confusing the issue (though there may
well be others) it appears that at least in some fairly trivial
testing (and I emphasize *trivial*), gmp doesn't appear faster than
our old implementation.

Given this code:

  (define (bench-random-int-op op)
    (let loop ((x-vals random-x-values)
               (y-vals random-y-values))
      (if (pair? x-vals)
          (begin
            (op (car x-vals)
                (car y-vals))
            (loop (cdr x-vals)
                  (cdr y-vals))))))

and a set of 4000000 pre-generated random pairs of bignums, via
(random 1000000000000000000), which all turned out to be bignums, I
get the following results:

  $ ./run-benchmark guile-1.6 ./guile-1.7 ./guile-gmp foo.scm
  starting trials
  ("bench-random-op +" "guile-1.6" ((user . 201) (sys . 0) (gc . 90)))
  ("bench-random-op -" "guile-1.6" ((user . 179) (sys . 1) (gc . 57)))
  ("bench-random-op *" "guile-1.6" ((user . 219) (sys . 0) (gc . 99)))
  ("bench-random-op /" "guile-1.6" ((user . 262) (sys . 0) (gc . 114)))
  ("bench-random-op gcd" "guile-1.6" ((user . 121) (sys . 2) (gc . 44)))
  ("bench-random-op lcm" "guile-1.6" ((user . 275) (sys . 0) (gc . 116)))
  starting trials
  ("bench-random-op +" "./guile-1.7" ((user . 342) (sys . 1) (gc . 60)))
  ("bench-random-op -" "./guile-1.7" ((user . 315) (sys . 0) (gc . 59)))
  ("bench-random-op *" "./guile-1.7" ((user . 352) (sys . 0) (gc . 56)))
  ("bench-random-op /" "./guile-1.7" ((user . 380) (sys . 0) (gc . 55)))
  ("bench-random-op gcd" "./guile-1.7" ((user . 266) (sys . 1) (gc . 56)))
  ("bench-random-op lcm" "./guile-1.7" ((user . 381) (sys . 0) (gc . 55)))
  starting trials
  ("bench-random-op +" "./guile-gmp" ((user . 402) (sys . 1) (gc . 67)))
  ("bench-random-op -" "./guile-gmp" ((user . 386) (sys . 0) (gc . 63)))
  ("bench-random-op *" "./guile-gmp" ((user . 436) (sys . 0) (gc . 65)))
  ("bench-random-op /" "./guile-gmp" ((user . 422) (sys . 0) (gc . 65)))
  ("bench-random-op gcd" "./guile-gmp" ((user . 387) (sys . 0) (gc . 62)))
  ("bench-random-op lcm" "./guile-gmp" ((user . 463) (sys . 0) (gc . 64)))

Bear in mind that there may well be mistakes in my code affecting this
result.

The second (perhaps related) issue involves memory use.  It turns out
that an mpz_t, which is the GMP integer type, takes up a minimum of 12
bytes (when an int is 4 bytes), and will always be more than that
because that 12 bytes doesn't include any space for the actual digits.
To make this clear, here is what an mpz_t looks like:

  typedef struct
  {
    int _mp_alloc;                /* Number of *limbs* allocated and pointed
                                     to by the _mp_d field.  */
    int _mp_size;                 /* abs(_mp_size) is the number of limbs the
                                     last field points to.  If _mp_size is
                                     negative this is a negative number.  */
    mp_limb_t *_mp_d;             /* Pointer to the limbs.  */
  } __mpz_struct;

Right now, in the new code a guile gmp bignum is a smob whose data
pointer is an mpz_t*.  Including the smob mpz_t pointer itself, this
gives us a minimum size of 24-bytes for an 8-byte bignum on many 32-bit
machines, and 40-bytes on many 64-bit machines :<

It's also worth noting that GMP never shrinks its allocations for a
given mpz_t, even if you change a giant integer to a much smaller
integer via subtraction or similar.  However, this may not really
matter much to us since we essentially always create a new value for
the result of an operation.

Our old implementation, on the other hand, if I understand it
correctly, would usually use a minimum of 4 bytes for a bignum,
actually probably 6, and so would require 10 bytes for it's smallest
bignum, including the smob pointer, and could grow in 2 byte
increments (the increment size usually depended on sizeof (unsigned
short)).  So with the old implementation, an 8-byte bignum would
probably require about 13-14 bytes, including the smob ptr.  Perhaps
more importantly, I believe the old implementation allowed bignums as
small as 6-bytes (10 with ptr).  With respect to storage use, this is
a much nicer jump from a 4-byte fixnum than GMP's 20-byte likely
minimum -- which would only get you a 4-byte integer.

So the first question is "Is GMP's minimum usage unacceptable?".  If
so, then there are a variety of things we could consider (thanks to
Dale for discussing this with me on irc at some length):

  1) possibly scrap GMP altogether.

  2) possibly try to use long-longs to provide an intermediate-sized
     rep where available, which could use "native" ops.  This
     intermediate value would require 12-bytes including smob ptr and
     provide an 8-byte int (presuming long long == 64-bit).

  3) possibly use mpz_t values only for computation, allocating them
     on the stack, and use the old "array of shorts" format in the
     heap.  The big cost here would be conversions to/from mpz_t and
     the array of shorts, but we would be able to allocate in 2-byte
     increments, with the same minimum sizes we used to have.  With
     respect to the costs, we'd have to know how much of the cost of
     (+ big-x big-y) is currently comprised of non-math operations
     i.e. how much would the conversion cost relative to the current
     interpretation overhead, etc.

  4) possibly use GMP's lower-level ops, and manage the storage of the
     GMP mp_limb_t array ourselves.  This should allow us to ditch
     mpz_t's _mp_alloc int field, and it would probably be safe to
     replace the _mp_size field with a short, since I doubt that we'd
     ever be likely to overflow long data[MAX_SHORT] digits. (What was
     the size limit, if any, on our old bignums?).

     The disadvantage here is that there would be much more code to
     write by hand -- we'd have to implement bignum
     addition/subtraction etc. ourselves from the GMP primitives,
     since these primitives only operate on individual _mp_limb_t
     values (i.e. digits).  We'd have to handle carry, borrow etc.,
     ourselves.

     The advantage is that these primitives are supposed to be very
     fast, since the common ops are likely to be coded in assembly on
     most platforms.  This approach would get us a minimum bignum
     size, including the smob ptr, of 18-bytes for an 8-byte bignum
     (10-bytes of overhead), presuming 32-bit arch and 32-bit longs.
     The smallest bignum we could represent would be (still presuming
     32-bit longs) 4-bytes, which including the smob ptr, would
     require 14-bytes of storage. 

  5) use more than one smob to represent bignums.  While there are a
     variety of ways this could work, one would be to have something
     like:

         big-2: mp_limb_t data[2]  (12-bytes incl smob ptr on 32-bit)
         big-4: mp_limb_t data[4]  (16-bytes incl smob ptr on 32-bit)
         big-n: mp_limb_t data*    (8-bytes + data, incl smob ptr on 32-bit)

      This would probably be the best use of space if using GMP, but
      it may well complicate the implementation of numbers.c too much
      and still has the same drawbacks as (4).

  6) try a super-ugly-hack (super-ugly, presuming the GMP people
     consider it not-kosher to touch mpz_t internals) where we perform
     all our operations as mpz_t values, but then convert them to a
     representation like this:

         struct _our_mpz { mp_limb_t *data; short size; }

     in the heap, after just copying the _mp_d data pointer out of the
     result mpz_t, dropping the _mp_alloc value (after checking to be
     sure alloc == size -- if not, we'd have to mpz_realloc2), and
     then putting their _mp_size into our size (after checking to be
     sure it will fit).  Then later, when we needed to perform
     computations with our representation, we'd copy this data back
     into a new mpz_t (mpz_t's are actually 1-element arrays of the
     _mpz_struct):

        mpz_t tmp;
        tmp[0]._mp_alloc = our_mpz->size;
        tmp[0]._mp_size = our_mpz->size;
        tmp[0]._mp_d = our_mpz->data;

     Though this mucks around in the gmp internals, it does have the
     distinct advantage of letting us continue to use the high-level
     mpz_ interface and still reduce the mpz_t storage overhead from
     12-bytes minimum overhead (excluding the smob ptr) to 6 bytes on
     32-bit machines, and it requires only *one* in-memory
     representation of the integer digits, with no conversions.

     If this was an option we had any interest in, it might be worth
     discussing with the GMP people.  If they don't want people
     getting/setting mpz_t internals (even when you think you know
     what you're doing) then perhaps they would be willing to
     formalize something with similar effect, as part of the API,
     i.e. perhaps something like:

       void
       mpz_compact_and_get_data (mpz_t src, int *size, mpz_limb_t **data);

     or similar.  This would first make sure that the limb data is
     only as large as the integer that the digits represent (i.e. it
     would realloc if needed, so that _mp_alloc becomes redundant b/c
     _mp_alloc == _mp_size) and then it would return the pointer to
     the limbs through *data, and the limb count through *size.  This
     would then be sufficient for us to manage our conversion.

All of this said, if there are others interested in this issue, I feel
like whatever we do I should probably make the (still not quite
finished -- i.e. random.c's bignum random is a dummy for now) gmp code
available somehow, whether in a CVS branch, patch, or similar so that
other people can test too.  I'd rather not have our decision based
solely on my evaluation since it's always possible I goofed somewhere,
either in the implementation or the testing.

Thanks

-- 
Rob Browning
rlb @defaultvalue.org, @linuxdevel.com, and @debian.org
Previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592  F9A0 25C8 D377 8C7E 73A4




reply via email to

[Prev in Thread] Current Thread [Next in Thread]