gcl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gcl-devel] GCL memory allocation and GC problems


From: Camm Maguire
Subject: Re: [Gcl-devel] GCL memory allocation and GC problems
Date: 01 Dec 2003 17:39:32 -0500
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2

Greetings, Vadim, and thank you *so* much for looking into this
important area.  While straightforward in principle, I think we should
begin now a thorough discussion of gbc in GCL and where we want to go
with it.

Your summary below is correct.  The behavior you see regarding
relocatable blocks is due to the fact that gbc calls for relocatable
pages are triggered not only when the relocatable area gets full, but
also when the heap needs expanding beyond the holesize amount.
Instead of more relocatable space, you need either a larger hole, or
more pages of whatever type happen to be filling up the heap.  I'm
sure you know already that bignums consume contiguous pages by
default, but can be set to use relocatable storage for some
performance gain.  I'm assuming bignums are the bottleneck in
ratsimp.  So unless you are using SET-GMP-RELOCATABLE, I'd also
suggest trying more contiguous pages.

I'd greatly appreciate your opinion on what defaults we should use
both for growth parameters and initial allocations by page type, and
how you come to your conclusion.  I see a growth parameter suggestion
below -- should it vary by page type?

It would also be nice to compare the number of gbc's triggered, the
speed of these gbc's per page, and the amount of garbage generated
with maxima compiled on cmucl.  If we determine that we generate too
much garbage internally to GCL, we can make use of the dynamic-extent
declaration when possible and use fast alloca at the C level instead
of malloc.  We may even be able to infer this declaration in the
compiler. 

One last item -- it appears as though you made a recent commit to some
core files, h/object.h and alloc.c if I recall, both to CVS head and
2.6.1.  Was this perhaps an accidental commit of your experimental
work below, or was it deliberate?  These changes are OK, and are
staying in, but I'd prefer some discussion before committing changes
to the core files, and we really should not touch the stable branch at
all without a known bug being fixed and some discussion first, I
think.  Please let me know if you think this policy is too
conservative.  I'm pretty flexible on it.

Take care,

"Vadim V. Zhytnikov" <address@hidden> writes:

> Hi!
> 
> I've been trying to pin down the reason of GCL
> performance problems.  The trouble is clearly related to
> memory allocation strategy and garbage collection.
> Roughly situation stands as follows:
> often computation time for some problem with GCL
> (I use Maxima as test program) is longer than for other CL
> implementations (Clisp, CMUCL). And the bigger the problem
> the worse is GCL's performance.  More thorough investigation
> shows that in such situation GCL spends 90-95% of computation
> time by doing garbage collections.  Net time for
> useful computation = (total time) - (GC time) is quite impressive.
> It compares to one of CMUCL (or even better).  But 95% of
> GC time spoils whole picture. The situation is far from being
> satisfactory. So my objective is to propose some improvements
> or to locate a bottleneck.
> 
> To proceed further I have to say a few words about
> GCL memory layout. Forgive me if I'm too verbose.
> 
> GCL's memory is divided into three non overlapping
> parts - heap area, relocatable blocks area and a hole
> between them.  Hole is just a free space. It shrinks
> when heap or relocatable blocks grow (toward each other).
> Memory is allocated by pages.
> 
> Each GCL object belongs to one of implementation types.
> Implementation types roughly correspond to CL types but
> not exactly.  For example compiled functions map to several
> implementation types.  On the other hand implementation type
> SPICE is used for internal purpose and has no corresponding
> lisp type.
> 
> Implementation types are classified according to size of the object.
> Take a look at the output of the function (room).  Each of the
> first eight lines correspond to one implementation type class.
> 
>    103/211   69.1%  1 CONS RATIO LONG-FLOAT COMPLEX STRUCTURE
>      1/28    12.7%    FIXNUM SHORT-FLOAT CHARACTER RANDOM-STATE READTABLE
>     49/49    73.3%    SYMBOL STREAM
>      1/2     14.1%    PACKAGE
>      1/38    47.9%    ARRAY HASH-TABLE VECTOR BIT-VECTOR PATHNAME
> CCLOSURE FAT-STRING
>     22/32    85.0%    STRING
>      3/27    95.6%    CFUN BIGNUM
>      6/6     86.7%    SFUN GFUN CFDATA SPICE NIL
> 
>     10/194            contiguous (21 blocks)
>        118            hole
>        50     9.4%    relocatable
> 
>     186 pages for cells
>     364 total pages
> 125151 pages available
>    5557 pages in heap but not gc'd + pages needed for gc marking
> 131072 maximum pages
> 
> Numbers on the left
>      AAA/BBB CC% D
> are
>      AAA - number of pages currently allocated for this class
>      BBB - maximal number of pages for this class
>      CC% - percent of space really occupied by cells
>       D  - number of GC occurred for this class, empty
>            if zero.
> All objects of one implementation type class share
> pages and their allocation and GC is managed together.
> 
> Usually objects are stored in heap.  Some of them (e.g. CONS
> - list nodes) resides entirely in heap.  For other objects
> (functions, arrays, symbols, etc) heap contains only header.
> The body of the object is stored in blocks.   There are two
> kinds of blocks. Contiguous blocks are stored in heap.
> Relocatable blocks are stored in relocatable area.
> Contiguous blocks as other objects in heap are newer
> moved.  Relocatable blocks can be compactified during GC.
> 
> When space for cell of some class is exhausted GCL starts
> garbage collection.  If after GC percent of free cells
> is lower than some thresold then GCL allocates some extra pages.
> There is a function which controls this process
> 
> (si::allocate-growth
>       <implementation-type>
>       min_grow
>       max_grow
>       percent_grow
>       percent_free )
> 
> If after GC for cells of <implementation-type> percent of free
> cells is less than percent_free then total number of pages
> is enlarged on percent_grow percents but not less than min_grow
> and no more than max_grow.
> 
> I'm sorry for such lengthly introduction but I need this
> for reference.  We come to the point after all.
> 
> The default values for these 4 parameters are
>      1 1000 50 10
> This numbers partially explains GCL performance problems.
> I think the they are not adequate to modern needs.
> First, allocation threshold  - 10% is too low.  In general if lisp
> works with low percent of free cells the frequency
> of GC rapidly grows.  Second, the maximal number
> of pages which can by allocated 1000 seems to be
> very small.  Such low values for these parameters
> makes GCL quite lazy at allocating extra memory.
> I think that something like
>      1 10000 50 30
> or even
>      1 10000 100 50
> is more up to date.
> 
> Now I'd like to show result of some simple
> tests which demonstrates how this machinery
> works in practice and also reveals some other
> problem.
> 
> I'm going to compute ratsimp((x+y+z)^500)$ and see what happens.
> On Athlon XP+ 2400 with 512Mb the result for
>     cmucl 18e             -   4 sec
>     clisp 2.31            -  16 sec
>     gcl 2.6.1 traditional -  23 sec
>     gcl 2.6.1 ansi        -  27 sec
> I use current gcl 2.6.1 cvs and Maxima cvs of September 9 2003
> (current Maxima cvs will not build on traditional gcl).
> 
> The maxima-init.lisp is
> ==============================================
> #+gcl
> (progn
>     (si::allocate-relocatable-pages 2000 t)
>     (si::allocate 'cons 2000 t)
>     (si::allocate 'fixnum 200 t)
>     #-gmp (si::allocate 'cfun 200 t)
>     #+gmp (si::allocate 'cfun 1000 t)
>     (si::allocate 'symbol 100 t)
>     #+gmp (si::set-gmp-allocate-relocatable t)
>     ;;(setq si::*notify-gbc* t)
>     )
> ===============================================
> 
> Now let's consider GCL timing more thoroughly.
> If I feed
>     showtime:true;
>     :lisp (setq si::*notify-gbc* t)
>     :lisp (si::gbc-time 0)
>     :lisp (room)
>     ratsimp((x+y+z)^500)$
>     :lisp (si::gbc-time)
>     :lisp (room)
> into Maxima I get the following output
> =============================================================
> GCL (GNU Common Lisp)  (2.6.1) Thu Nov 20 20:56:28 MSK 2003
> Licensed under GNU Library General Public License
> Dedicated to the memory of W. Schelter
> 
> Use (help) to get some basic information on how to use GCL.
> Maxima 5.9.0.1cvs http://maxima.sourceforge.net
> Distributed under the GNU Public License. See the file COPYING.
> Dedicated to the memory of William Schelter.
> This is a development version of Maxima. The function bug_report()
> provides bug reporting information.
> (C1)
> Evaluation took 0.00 seconds (0.00 elapsed)
> (D1)                               TRUE
> (C2)
> T
> (C2)
> 0
> (C2) 2000/2000 28.2%CONS RATIO LONG-FLOAT COMPLEX STRUCTURE
>    200/200    0.4%    FIXNUM SHORT-FLOAT CHARACTER RANDOM-STATE
> READTABLE NIL
>    135/135   98.1%    SYMBOL STREAM
>      1/2     19.2%    PACKAGE
>     11/38    46.6%    ARRAY HASH-TABLE VECTOR BIT-VECTOR PATHNAME
> CCLOSURE FAT-STRING
>     68/72    96.4%    STRING
> 1000/1000   1.1%    CFUN BIGNUM
>     26/28    97.3%    SFUN GFUN CFDATA SPICE NIL
> 
>      4/336            contiguous (4 blocks)
>        128            hole
>        2000   0.3%  3 relocatable
> 
>    3441 pages for cells
>    5573 total pages
> 116770 pages available
>    8729 pages in heap but not gc'd + pages needed for gc marking
> 131072 maximum pages
> (C2)
> [GC for 2000 CONS pages..(T=6).GC finished]
> [GC for 2000 CONS pages..(T=8).GC finished]
> [GC for 2000 RELOCATABLE-BLOCKS pages..(T=10).GC finished]
> [GC for 2000 CONS pages..(T=10).GC finished]
> make_cons calls grow_linear ...
>     grow_linear old=2000 delt=1000 new=3000
> [GC for 2000 RELOCATABLE-BLOCKS pages..(T=13).GC finished]
> [GC for 2000 RELOCATABLE-BLOCKS pages..(T=14).GC finished]
> [GC for 2000 RELOCATABLE-BLOCKS pages..(T=14).GC finished]
> [GC for 2000 RELOCATABLE-BLOCKS pages..(T=14).GC finished]
> [GC for 2000 RELOCATABLE-BLOCKS pages..(T=16).GC finished]
> [GC for 2000 RELOCATABLE-BLOCKS pages..(T=16).GC finished]
> [GC for 2000 RELOCATABLE-BLOCKS pages..(T=17).GC finished]
> [GC for 2000 RELOCATABLE-BLOCKS pages..(T=18).GC finished]
> [GC for 2000 RELOCATABLE-BLOCKS pages..(T=19).GC finished]
> alloc_relblock calls grow_linear ...
>     grow_linear old=2000 delt=1000 new=3000
> [GC for 3000 RELOCATABLE-BLOCKS pages..(T=20).GC finished]
> [GC for 3000 CONS pages..(T=14).GC finished]
> make_cons calls grow_linear ...
>     grow_linear old=3000 delt=1000 new=4000
> [GC for 3000 RELOCATABLE-BLOCKS pages..(T=20).GC finished]
> [GC for 3000 RELOCATABLE-BLOCKS pages..(T=20).GC finished]
> [GC for 3000 RELOCATABLE-BLOCKS pages..(T=22).GC finished]
> [GC for 3000 RELOCATABLE-BLOCKS pages..(T=23).GC finished]
> [GC for 3000 RELOCATABLE-BLOCKS pages..(T=23).GC finished]
> [GC for 3000 RELOCATABLE-BLOCKS pages..(T=24).GC finished]
> [GC for 3000 RELOCATABLE-BLOCKS pages..(T=23).GC finished]
> [GC for 3000 RELOCATABLE-BLOCKS pages..(T=25).GC finished]
> [GC for 3000 RELOCATABLE-BLOCKS pages..(T=26).GC finished]
> [GC for 3000 RELOCATABLE-BLOCKS pages..(T=25).GC finished]
> alloc_relblock calls grow_linear ...
>     grow_linear old=3000 delt=1000 new=4000
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=26).GC finished]
> [GC for 4000 CONS pages..(T=19).GC finished]
> make_cons calls grow_linear ...
>     grow_linear old=4000 delt=1000 new=5000
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=28).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=28).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=28).GC finished]
> [GC for 1000 CFUN pages..(T=20).GC finished]
> alloc_objects allocs pages ...
>     grow_linear old=1000 delt=500 new=1500
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=18).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=23).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=23).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=25).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=25).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=25).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=25).GC finished]
> [GC for 5000 CONS pages..(T=20).GC finished]
> make_cons calls grow_linear ...
>     grow_linear old=5000 delt=1000 new=6000
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=25).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=25).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=26).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=26).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=26).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=27).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=27).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=27).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=27).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=28).GC finished]
> [GC for 6000 CONS pages..(T=23).GC finished]
> make_cons calls grow_linear ...
>     grow_linear old=6000 delt=1000 new=7000
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=28).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=28).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=28).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=29).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=30).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=25).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=29).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=31).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=34).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=35).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=36).GC finished]
> [GC for 7000 CONS pages..(T=30).GC finished]
> make_cons calls grow_linear ...
>     grow_linear old=7000 delt=1000 new=8000
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=37).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=38).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=38).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=39).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=38).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=39).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=40).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=41).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=42).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=42).GC finished]
> [GC for 8000 CONS pages..(T=35).GC finished]
> make_cons calls grow_linear ...
>     grow_linear old=8000 delt=1000 new=9000
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=43).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=41).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=43).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=44).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=45).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=44).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=44).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=46).GC finished]
> Evaluation took 24.38 seconds (24.50 elapsed)
> (C3)
> 2202
> (C3) 8751/9000 99.0%  9CONS RATIO LONG-FLOAT COMPLEX STRUCTURE
>    200/200    0.4%    FIXNUM SHORT-FLOAT CHARACTER RANDOM-STATE
> READTABLE NIL
>    135/135   98.1%    SYMBOL STREAM
>      1/2     19.2%    PACKAGE
>     11/38    45.1%    ARRAY HASH-TABLE VECTOR BIT-VECTOR PATHNAME
> CCLOSURE FAT-STRING
>     68/72    81.4%    STRING
> 1000/1500  50.2%  1 CFUN BIGNUM
>     26/28    97.3%    SFUN GFUN CFDATA SPICE NIL
> 
>      4/336            contiguous (5 blocks)
>        128            hole
>        4000  58.3% 74 relocatable
> 
> 10192 pages for cells
> 14324 total pages
> 106019 pages available
> 10729 pages in heap but not gc'd + pages needed for gc marking
> 131072 maximum pages
> (C3)
> ===========================================================
> First of all take a look at 2202 right after first (C3).
> This is total time of GC in 1/100 sec.  So 22.02 seconds
> of total 24.38 are spent for garbage collections - 90%.
> We have 9 GC for CONS cells, 1 for CFUN and 71 for
> relocatable blocks.
> 
> Let's try to improve situation by adding
>      :lisp (si::allocate-growth 'cons 1 10000 100 50)
> to the input. New result is (I show bottom lines only)
> ===========================================================
> .....
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=45).GC finished]
> [GC for 4000 RELOCATABLE-BLOCKS pages..(T=45).GC finished]
> Evaluation took 23.00 seconds (23.13 elapsed)
> (C3)
> 2064
> (C3) 8751/16000 99.0%  3CONS RATIO LONG-FLOAT COMPLEX STRUCTURE
>    200/200    0.4%    FIXNUM SHORT-FLOAT CHARACTER RANDOM-STATE
> READTABLE NIL
>    135/135   98.1%    SYMBOL STREAM
>      1/2     19.2%    PACKAGE
>     11/38    45.1%    ARRAY HASH-TABLE VECTOR BIT-VECTOR PATHNAME
> CCLOSURE FAT-STRING
>     68/72    81.4%    STRING
> 1000/1500  50.2%  1 CFUN BIGNUM
>     26/28    97.3%    SFUN GFUN CFDATA SPICE NIL
> 
>      4/336            contiguous (5 blocks)
>        128            hole
>        4000  58.3% 73 relocatable
> 
> 10192 pages for cells
> 14324 total pages
> 106019 pages available
> 10729 pages in heap but not gc'd + pages needed for gc marking
> 131072 maximum pages
> (C3)
> =============================================================
> Now we have only 3 garbage collections for CONS cells.
> Very good since GC time for CONS is now 3 times shorter.
> 
> Let's try to do something like this with relocatable pages
>     :lisp (si::allocate-growth 'relocatable 1 10000 200 80)
> 
> The result is really strange:
> ==============================================================
> [GC for 6000 RELOCATABLE-BLOCKS pages..(T=45).GC finished]
> [GC for 6000 RELOCATABLE-BLOCKS pages..(T=46).GC finished]
> Evaluation took 22.43 seconds (22.59 elapsed)
> (C3)
> 2020
> (C3) 8745/16000100.0%  3CONS RATIO LONG-FLOAT COMPLEX STRUCTURE
>    200/200    0.4%    FIXNUM SHORT-FLOAT CHARACTER RANDOM-STATE
> READTABLE NIL
>    135/135   98.1%    SYMBOL STREAM
>      1/2     19.2%    PACKAGE
>     11/38    45.1%    ARRAY HASH-TABLE VECTOR BIT-VECTOR PATHNAME
> CCLOSURE FAT-STRING
>     68/72    81.4%    STRING
> 1000/1000  50.2%    CFUN BIGNUM
>     26/28    97.3%    SFUN GFUN CFDATA SPICE NIL
> 
>      4/336            contiguous (5 blocks)
>        77             hole
>        6000  38.9% 72 relocatable
> 
> 10186 pages for cells
> 16267 total pages
> 102025 pages available
> 12780 pages in heap but not gc'd + pages needed for gc marking
> 131072 maximum pages
> (C3)
> ==============================================================
> In spite of the fact that more aggressive memory allocation
> for relocatable blocks works (compare 6000 final pages with
> 4000 in previous test) the number and total time of GC for
> relocatable blocks remains virtually unchanged!
> 
> Let's do another trick.  We allocate a large space
> (30000 pages) for relocatable blocks at the beginning.
> I expect that now number of GC for relocatable blocks
> should be much smaller.  Nothing of the kind!
> ==============================================================
> [GC for 30000 RELOCATABLE-BLOCKS pages..(T=46).GC finished]
> Evaluation took 22.95 seconds (23.05 elapsed)
> (C3)
> 2066
> (C3) 8762/16000 98.7%  3CONS RATIO LONG-FLOAT COMPLEX STRUCTURE
>    200/200    0.4%    FIXNUM SHORT-FLOAT CHARACTER RANDOM-STATE
> READTABLE NIL
>    135/135   98.1%    SYMBOL STREAM
>      1/2     19.2%    PACKAGE
>     11/38    45.1%    ARRAY HASH-TABLE VECTOR BIT-VECTOR PATHNAME
> CCLOSURE FAT-STRING
>     68/72    81.4%    STRING
> 1000/1500  50.2%  1 CFUN BIGNUM
>     26/28    97.3%    SFUN GFUN CFDATA SPICE NIL
> 
>      4/336            contiguous (5 blocks)
>        128            hole
>        30000  7.8% 72 relocatable
> 
> 10203 pages for cells
> 40335 total pages
> 54008 pages available
> 36729 pages in heap but not gc'd + pages needed for gc marking
> 131072 maximum pages
> (C3)
> =============================================================
> Once again number and time of GC for relocatable blocks remains
> practically unchanged.
> 
> And this is vary alarming.  GCL spends 90% of time
> doing GC for relocatable blocks and we seems to be unable
> to improve the situation.  At the moment I simply don't
> understand how GC rate can be the same for 2000 and
> 30000 initial pages. Probably I miss something important
> about GC for relocatable blocks.  But in any case
> I think that situation is far from being perfect.
> Any comments and ideas are greatly appreciated.
> 
> PS: All transcripts of my tests are attached.
> 
> -- 
>        Vadim V. Zhytnikov
> 
>         <address@hidden>
>        <address@hidden>
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Gcl-devel mailing list
> address@hidden
> http://mail.gnu.org/mailman/listinfo/gcl-devel

-- 
Camm Maguire                                            address@hidden
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah




reply via email to

[Prev in Thread] Current Thread [Next in Thread]