[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
CPU and GC cost of bignums
From: |
Ludovic Courtès |
Subject: |
CPU and GC cost of bignums |
Date: |
Tue, 04 Feb 2020 17:56:51 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) |
Hello!
(If you’re in a hurry, there are good news at the bottom.)
I noticed that 3.0 (and also 2.2 actually) takes a long time to compile
Guix’ gnu/services/mail.scm, which is macro-heavy, producing lots of
top-level defines.
At -O2 (the default), we have:
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ,pr (compile-file "gnu/services/mail.scm")
% cumulative self
time seconds seconds procedure
13.79 19.16 14.46 language/cps/slot-allocation.scm:846:19
11.05 11.58 11.58 language/cps/intmap.scm:396:0:intmap-ref
7.56 12.63 7.92 anon #x10768e0
6.61 7.70 6.92 ice-9/popen.scm:145:0:reap-pipes
5.50 182.23 5.76 language/cps/intset.scm:470:5:visit-branch
4.65 4.87 4.87 system/vm/linker.scm:179:0:string-table-intern!
4.07 5.04 4.26 ice-9/vlist.scm:534:0:vhash-assoc
3.54 3.93 3.71 language/cps/intmap.scm:184:0:intmap-add!
3.28 6.65 3.43 language/cps/intset.scm:270:2:adjoin
2.70 2.82 2.82 language/cps/intset.scm:349:0:intset-ref
1.80 34.84 1.88 language/cps/intmap.scm:247:2:adjoin
1.80 5.93 1.88 language/cps/intset.scm:269:0:intset-add
1.74 18.22 1.83 language/cps/intmap.scm:246:0:intmap-add
1.22 3.27 1.27 language/cps/intset.scm:382:2:visit-node
1.16 2.94 1.22 language/cps/intset.scm:551:2:union
1.11 1.38 1.16 language/cps/intset.scm:204:0:intset-add!
0.74 1281.59 0.78 language/cps/intset.scm:472:5:visit-branch
[...]
Sample count: 1892
Total time: 104.795540582 seconds (85.091574653 seconds in GC)
--8<---------------cut here---------------end--------------->8---
At -O1:
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ,use(system base optimize)
scheme@(guile-user)> ,pr (compile-file "gnu/services/mail.scm" #:opts
(optimizations-for-level 1))
% cumulative self
time seconds seconds procedure
11.76 129.78 7.60 language/cps/intset.scm:470:5:visit-branch
10.77 6.96 6.96 language/cps/intmap.scm:396:0:intmap-ref
10.43 11.69 6.74 language/cps/slot-allocation.scm:846:19
8.99 7.39 5.81 ice-9/vlist.scm:534:0:vhash-assoc
7.55 4.88 4.88 system/vm/linker.scm:179:0:string-table-intern!
6.44 4.16 4.16 ice-9/popen.scm:145:0:reap-pipes
4.22 2.80 2.72 language/cps/intmap.scm:184:0:intmap-add!
1.89 1.86 1.22 language/cps/slot-allocation.scm:681:17
1.89 1.43 1.22 ice-9/vlist.scm:539:0:vhash-assq
1.78 1.51 1.15 language/cps/slot-allocation.scm:505:17
1.22 1.36 0.79 language/cps/slot-allocation.scm:846:19
1.22 1.08 0.79 language/cps/slot-allocation.scm:505:17
[...]
Sample count: 901
Total time: 64.602907835 seconds (55.87541493 seconds in GC)
--8<---------------cut here---------------end--------------->8---
language/cps/slot-allocation.scm:846:19 corresponds to:
(define (compute-live-slots* slots label live-vars)
(intset-fold (lambda (var live)
(match (get-slot slots var)
(#f live)
(slot (add-live-slot slot live)))) ;L846
(intmap-ref live-vars label)
0))
The GC times remain extremely high though, and it’s also coming from
‘compute-live-slots*’:
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (gcprof (lambda () (compile-file "gnu/services/mail.scm"
#:opts (optimizations-for-level 1))))
% cumulative self
time seconds seconds procedure
58.14 34.56 34.56 language/cps/slot-allocation.scm:846:19
8.01 4.76 4.76 language/cps/slot-allocation.scm:681:17
8.01 4.76 4.76 language/cps/slot-allocation.scm:505:17
6.98 4.15 4.15 language/cps/slot-allocation.scm:505:17
6.46 3.84 3.84 language/cps/slot-allocation.scm:846:19
1.29 0.77 0.77 anon #x23e88e0
[...]
Sample count: 387
Total time: 59.442422179 seconds (50.331193744 seconds in GC)
--8<---------------cut here---------------end--------------->8---
(I believe Guile commit 5675e46410c9a24b05ddf58cbe3b998a4c9cad7c and its
parent were made to optimize the -O1 case back in 2017¹.)
‘compute-live-slots*’ returns an integer and the allocation comes from
line 846, where we allocate a bignum, in this case a verybignum even.
And for each bignum, we register a finalizer, which itself takes space.
(Time passes…)
The patch below (also for 2.2) gives us better timing:
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (gcprof (lambda () (compile-file "gnu/services/mail.scm"
#:opts (optimizations-for-level 1))))
% cumulative self
time seconds seconds procedure
18.75 2.49 2.49 anon #x6f58e0
[...]
Sample count: 32
Total time: 13.290191232 seconds (4.584969888 seconds in GC)
--8<---------------cut here---------------end--------------->8---
… but has the disadvantage that it doesn’t work: ‘numbers.test’ fails
badly on bignums.
However, it turns out that removing the ‘mp_set_memory_functions’ call
works, and the result is:
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (gcprof (lambda () (compile-file "gnu/services/mail.scm"
#:opts (optimizations-for-level 1))))
% cumulative self
time seconds seconds procedure
20.00 2.60 2.60 anon #x12578e0
10.00 3.47 1.30 language/cps/intset.scm:270:2:adjoin
6.67 0.87 0.87 ice-9/boot-9.scm:2201:0:%load-announce
6.67 0.87 0.87 anon #x1253160
3.33 146.48 0.43 ice-9/threads.scm:388:4
3.33 1.30 0.43 language/cps/intset.scm:759:8:lp
3.33 0.87 0.43 system/vm/assembler.scm:2854:4:write-die
3.33 0.43 0.43 language/cps/slot-allocation.scm:843:19
3.33 0.43 0.43 language/cps/intmap.scm:167:0:persistent-intmap
[...]
Sample count: 30
Total time: 13.001181844 seconds (4.278418897 seconds in GC)
--8<---------------cut here---------------end--------------->8---
It’s 4.5 times faster than what we have now.
Andy, anything against removing that ‘mp_set_memory_functions’ call
altogether, or having ‘scm_install_gmp_memory_functions’ default to 0?
Thanks,
Ludo’.
¹ https://lists.gnu.org/archive/html/guile-devel/2017-10/msg00048.html
diff --git a/libguile/numbers.c b/libguile/numbers.c
index d1b463358..cf21a86ca 100644
--- a/libguile/numbers.c
+++ b/libguile/numbers.c
@@ -1,4 +1,4 @@
-/* Copyright 1995-2016,2018-2019
+/* Copyright 1995-2016,2018-2020
Free Software Foundation, Inc.
Portions Copyright 1990-1993 by AT&T Bell Laboratories and Bellcore.
@@ -218,16 +218,6 @@ static mpz_t z_negative_one;
-/* Clear the `mpz_t' embedded in bignum PTR. */
-static void
-finalize_bignum (void *ptr, void *data)
-{
- SCM bignum;
-
- bignum = SCM_PACK_POINTER (ptr);
- mpz_clear (SCM_I_BIG_MPZ (bignum));
-}
-
/* The next three functions (custom_libgmp_*) are passed to
mp_set_memory_functions (in GMP) so that memory used by the digits
themselves is known to the garbage collector. This is needed so
@@ -237,19 +227,20 @@ finalize_bignum (void *ptr, void *data)
static void *
custom_gmp_malloc (size_t alloc_size)
{
- return scm_malloc (alloc_size);
+ return scm_gc_malloc_pointerless (alloc_size, "GMP");
}
static void *
custom_gmp_realloc (void *old_ptr, size_t old_size, size_t new_size)
{
- return scm_realloc (old_ptr, new_size);
+ return scm_gc_realloc (old_ptr, old_size, new_size, "GMP");
}
static void
custom_gmp_free (void *ptr, size_t size)
{
- free (ptr);
+ /* Do nothing: all memory allocated by GMP is under GC control and
+ will be freed when needed. */
}
@@ -264,8 +255,6 @@ make_bignum (void)
"bignum");
p[0] = scm_tc16_big;
- scm_i_set_finalizer (p, finalize_bignum, NULL);
-
return SCM_PACK (p);
}
- CPU and GC cost of bignums,
Ludovic Courtès <=