guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

CPU and GC cost of bignums


From: Ludovic Courtès
Subject: CPU and GC cost of bignums
Date: Tue, 04 Feb 2020 17:56:51 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)

Hello!

(If you’re in a hurry, there are good news at the bottom.)

I noticed that 3.0 (and also 2.2 actually) takes a long time to compile
Guix’ gnu/services/mail.scm, which is macro-heavy, producing lots of
top-level defines.

At -O2 (the default), we have:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ,pr (compile-file "gnu/services/mail.scm")
%     cumulative   self             
time   seconds     seconds  procedure
 13.79     19.16     14.46  language/cps/slot-allocation.scm:846:19
 11.05     11.58     11.58  language/cps/intmap.scm:396:0:intmap-ref
  7.56     12.63      7.92  anon #x10768e0
  6.61      7.70      6.92  ice-9/popen.scm:145:0:reap-pipes
  5.50    182.23      5.76  language/cps/intset.scm:470:5:visit-branch
  4.65      4.87      4.87  system/vm/linker.scm:179:0:string-table-intern!
  4.07      5.04      4.26  ice-9/vlist.scm:534:0:vhash-assoc
  3.54      3.93      3.71  language/cps/intmap.scm:184:0:intmap-add!
  3.28      6.65      3.43  language/cps/intset.scm:270:2:adjoin
  2.70      2.82      2.82  language/cps/intset.scm:349:0:intset-ref
  1.80     34.84      1.88  language/cps/intmap.scm:247:2:adjoin
  1.80      5.93      1.88  language/cps/intset.scm:269:0:intset-add
  1.74     18.22      1.83  language/cps/intmap.scm:246:0:intmap-add
  1.22      3.27      1.27  language/cps/intset.scm:382:2:visit-node
  1.16      2.94      1.22  language/cps/intset.scm:551:2:union
  1.11      1.38      1.16  language/cps/intset.scm:204:0:intset-add!
  0.74   1281.59      0.78  language/cps/intset.scm:472:5:visit-branch

[...]

Sample count: 1892
Total time: 104.795540582 seconds (85.091574653 seconds in GC)
--8<---------------cut here---------------end--------------->8---

At -O1:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ,use(system base optimize)
scheme@(guile-user)> ,pr (compile-file "gnu/services/mail.scm" #:opts 
(optimizations-for-level 1))
%     cumulative   self             
time   seconds     seconds  procedure
 11.76    129.78      7.60  language/cps/intset.scm:470:5:visit-branch
 10.77      6.96      6.96  language/cps/intmap.scm:396:0:intmap-ref
 10.43     11.69      6.74  language/cps/slot-allocation.scm:846:19
  8.99      7.39      5.81  ice-9/vlist.scm:534:0:vhash-assoc
  7.55      4.88      4.88  system/vm/linker.scm:179:0:string-table-intern!
  6.44      4.16      4.16  ice-9/popen.scm:145:0:reap-pipes
  4.22      2.80      2.72  language/cps/intmap.scm:184:0:intmap-add!
  1.89      1.86      1.22  language/cps/slot-allocation.scm:681:17
  1.89      1.43      1.22  ice-9/vlist.scm:539:0:vhash-assq
  1.78      1.51      1.15  language/cps/slot-allocation.scm:505:17
  1.22      1.36      0.79  language/cps/slot-allocation.scm:846:19
  1.22      1.08      0.79  language/cps/slot-allocation.scm:505:17

[...]

Sample count: 901
Total time: 64.602907835 seconds (55.87541493 seconds in GC)
--8<---------------cut here---------------end--------------->8---

language/cps/slot-allocation.scm:846:19 corresponds to:

    (define (compute-live-slots* slots label live-vars)
      (intset-fold (lambda (var live)
                     (match (get-slot slots var)
                       (#f live)
                       (slot (add-live-slot slot live))))    ;L846
                   (intmap-ref live-vars label)
                   0))

The GC times remain extremely high though, and it’s also coming from
‘compute-live-slots*’:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (gcprof (lambda () (compile-file "gnu/services/mail.scm" 
#:opts (optimizations-for-level 1))))
%     cumulative   self             
time   seconds     seconds  procedure
 58.14     34.56     34.56  language/cps/slot-allocation.scm:846:19
  8.01      4.76      4.76  language/cps/slot-allocation.scm:681:17
  8.01      4.76      4.76  language/cps/slot-allocation.scm:505:17
  6.98      4.15      4.15  language/cps/slot-allocation.scm:505:17
  6.46      3.84      3.84  language/cps/slot-allocation.scm:846:19
  1.29      0.77      0.77  anon #x23e88e0

[...]

Sample count: 387
Total time: 59.442422179 seconds (50.331193744 seconds in GC)
--8<---------------cut here---------------end--------------->8---

(I believe Guile commit 5675e46410c9a24b05ddf58cbe3b998a4c9cad7c and its
parent were made to optimize the -O1 case back in 2017¹.)

‘compute-live-slots*’ returns an integer and the allocation comes from
line 846, where we allocate a bignum, in this case a verybignum even.
And for each bignum, we register a finalizer, which itself takes space.

(Time passes…)

The patch below (also for 2.2) gives us better timing:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (gcprof (lambda () (compile-file "gnu/services/mail.scm" 
#:opts (optimizations-for-level 1))))
%     cumulative   self             
time   seconds     seconds  procedure
 18.75      2.49      2.49  anon #x6f58e0

[...]

Sample count: 32
Total time: 13.290191232 seconds (4.584969888 seconds in GC)
--8<---------------cut here---------------end--------------->8---

… but has the disadvantage that it doesn’t work: ‘numbers.test’ fails
badly on bignums.

However, it turns out that removing the ‘mp_set_memory_functions’ call
works, and the result is:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (gcprof (lambda () (compile-file "gnu/services/mail.scm" 
#:opts (optimizations-for-level 1))))
%     cumulative   self             
time   seconds     seconds  procedure
 20.00      2.60      2.60  anon #x12578e0
 10.00      3.47      1.30  language/cps/intset.scm:270:2:adjoin
  6.67      0.87      0.87  ice-9/boot-9.scm:2201:0:%load-announce
  6.67      0.87      0.87  anon #x1253160
  3.33    146.48      0.43  ice-9/threads.scm:388:4
  3.33      1.30      0.43  language/cps/intset.scm:759:8:lp
  3.33      0.87      0.43  system/vm/assembler.scm:2854:4:write-die
  3.33      0.43      0.43  language/cps/slot-allocation.scm:843:19
  3.33      0.43      0.43  language/cps/intmap.scm:167:0:persistent-intmap

[...]

Sample count: 30
Total time: 13.001181844 seconds (4.278418897 seconds in GC)
--8<---------------cut here---------------end--------------->8---

It’s 4.5 times faster than what we have now.

Andy, anything against removing that ‘mp_set_memory_functions’ call
altogether, or having ‘scm_install_gmp_memory_functions’ default to 0?

Thanks,
Ludo’.

¹ https://lists.gnu.org/archive/html/guile-devel/2017-10/msg00048.html

diff --git a/libguile/numbers.c b/libguile/numbers.c
index d1b463358..cf21a86ca 100644
--- a/libguile/numbers.c
+++ b/libguile/numbers.c
@@ -1,4 +1,4 @@
-/* Copyright 1995-2016,2018-2019
+/* Copyright 1995-2016,2018-2020
      Free Software Foundation, Inc.
 
    Portions Copyright 1990-1993 by AT&T Bell Laboratories and Bellcore.
@@ -218,16 +218,6 @@ static mpz_t z_negative_one;
 
 
 
-/* Clear the `mpz_t' embedded in bignum PTR.  */
-static void
-finalize_bignum (void *ptr, void *data)
-{
-  SCM bignum;
-
-  bignum = SCM_PACK_POINTER (ptr);
-  mpz_clear (SCM_I_BIG_MPZ (bignum));
-}
-
 /* The next three functions (custom_libgmp_*) are passed to
    mp_set_memory_functions (in GMP) so that memory used by the digits
    themselves is known to the garbage collector.  This is needed so
@@ -237,19 +227,20 @@ finalize_bignum (void *ptr, void *data)
 static void *
 custom_gmp_malloc (size_t alloc_size)
 {
-  return scm_malloc (alloc_size);
+  return scm_gc_malloc_pointerless (alloc_size, "GMP");
 }
 
 static void *
 custom_gmp_realloc (void *old_ptr, size_t old_size, size_t new_size)
 {
-  return scm_realloc (old_ptr, new_size);
+  return scm_gc_realloc (old_ptr, old_size, new_size, "GMP");
 }
 
 static void
 custom_gmp_free (void *ptr, size_t size)
 {
-  free (ptr);
+  /* Do nothing: all memory allocated by GMP is under GC control and
+     will be freed when needed.  */
 }
 
 
@@ -264,8 +255,6 @@ make_bignum (void)
                                 "bignum");
   p[0] = scm_tc16_big;
 
-  scm_i_set_finalizer (p, finalize_bignum, NULL);
-
   return SCM_PACK (p);
 }

reply via email to

[Prev in Thread] Current Thread [Next in Thread]