[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gcl-devel] GCL memory allocation and GC problems

From: Camm Maguire
Subject: Re: [Gcl-devel] GCL memory allocation and GC problems
Date: 16 Jan 2004 11:37:39 -0500
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2


"Vadim V. Zhytnikov" <address@hidden> writes:

> Let me clarify the misunderstanding.  When I first run fresh
> gcl build with new memory layout (faster grow, larger hole)
> I noticed that right after start (room) shows more allocated contiguous
> pages than GCL permanently installed on my system.  My mistake is
> that I decided that this is some suspicious by-product of new memory
> parameters.  In fact my permanent GCL is quite old - September 2003.
> Recent CVS GCL (before new memory parameters) shows exactly the same
> contiguous pages number.  So the difference has nothing to do with
> recent changes.  But noticeable difference between current CVS and one
> of September really exist.  It can be seen ether right after start
> and in the way how GCL allocates large numbers of cont blocks.
> I've tried simple test - create 20000 element list of large random
> bignums with (set-gmp-allocate-relocatables nil).   The resulting
> numbers with September's GCL are
>    465/676      16661 contiguous blocks
> while the same test with current GCL gives
>    796/1152     24284 contiguous blocks
> Strange difference.  But let's leave this problem for a moment.
> It is not crucially important since now we are trying to avoid
> cont blocks allocation as much as possible.

GCL used to mark all contiguous pages as t_other on saving the system,
after a gbc.  I removed this as it appeared very wasteful, especially
with systems such as axiom which go through several preliminary saves
in the build process. (i.e. the contiguous pages even when in use were
never full, denying future modest mallocs the residual memory without
allocating new pages.)  So in brief, the pages are still in the image,
just not reported by (room) as they are t_other.  While wasteful in
terms of space, this might have been put in to save time in the
computationally expensive algorithms manipulating the contiguous
blocks.  I have not measured the effect myself, but I feel a better
approach is to improve the algorithm and/or use non-contiguous pages
wherever possible.  

Here is the snippet (from gbc.c)

/*     for (i = 0;  i < maxpage;  i++) */
/*       if ((enum type)type_map[i] == t_contiguous) */
/*      type_map[i] = (char)t_other; */
/*     cb_pointer = NULL; */
/*     maxcbpage -= ncbpage; */
/*     if (maxcbpage < 100) */
/*       maxcbpage = 100; */
/*     ncbpage = 0; */
/*     ncb = 0; */
BTW, does anyone know of optimal malloc/free algorithms?  I can look
in libc if/when time permits.  Right now we just have a linked list
of free blocks ordered by size, which is traversed from the beginning
and stops at the first block of the requisite size on allocating and
deallocating blocks:

insert_contblock(char *p, int s) {

  struct contblock **cbpp, *cbp;
  /* SGC cont pages: This used to return when s<CBMINSIZE, but we need
     to be able to sweep small (e.g. bignum) contblocks.  FIXME:
     should never be called with s<=0 to begin with.  CM 20030827*/
  if (s<=0)
  cbp = (struct contblock *)p;
  /* SGC cont pages: allocated sizes may not be zero mod CPTR_SIZE,
     e.g. string fillp, but alloc_contblock rounded up the allocation
     like this, which we follow here.  CM 20030827 */
  cbp->cb_size = ROUND_UP_PTR_CONT(s);
  for (cbpp = &cb_pointer;  *cbpp;  cbpp = &((*cbpp)->cb_link))
    if ((*cbpp)->cb_size >= s) {
      if (*cbpp==cbp) {
        fprintf(stderr,"Trying to install a circle at %p\n",cbp);
      if (sgc_enabled) 
      cbp->cb_link = *cbpp;
      *cbpp = cbp;
      if (sgc_enabled) 
  cbp->cb_link = NULL;
  *cbpp = cbp;
  if (sgc_enabled) 


If I recall, this was the rate limiting step on Vadim's earlier
benchmarks using ratsimp with contiguous page bignums.

> Is it worth to increase hole size?  Judge for yourself.
> Do you remember this (pass) test which we used for
> cons pages allocation?  I've tried the same test on several
> CL implementation on Athlon XP+ 2400.  These numbers is the
> time elapsed by 10 (pass) calls:
> clisp                                               -  12 sec
> cmucl                                               -  12 sec
> sbcl                                                -   8 sec

Anyone know what sbcl is doing differently here?

> GCL current CVS with new growth and hole size 512   - 166 sec
> The same but with old default hole size 128 (?)     - 785 sec
> The same with 1024 hole size                        -  83 sec
> The same but with 10000 hole size                   -  13 sec
> I think that 785 seconds is really too much.
> OK, so what?  Are these numbers _really_ important? Maybe
> 2.5 Mb is more important? Obviously there is no universal answer
> this is usual space/speed tradeoff.  Let's consider the situation
> from practical point of view.  Main GCL customers Maxima, ACL2
> and Axiom are large Lisp programs which are often used
> for large computations.  Of course they can be used also
> for something very simple but even in such situation is extra 2.5 Mb
> important?  In general I don't think so taking into account typical
> modern and not-so-modern hardware.  I could only imagine something
> like hand-held computer where this extra RAM may be of real
> importance.  On the other hand I understand that large
> default hole size cuts some flexibility.  So let's make initial
> hole size a new configure option (It is not so hard, isn't it?)
> with the default value 4*MAXPAGE/1024 or even 8*MAXPAGE/1024.

OK, I'm going to do this, but the default needs to be MAXPAGE/10 to be
competitive in your results above.  The way we should look at it is
how many core expansions do we think are reasonable in allocating the
full MAXPAGE amount of memory?  The larger the hole, the fewer the
steps, and the faster, though some granularity issues arise in the
last step, at least with the current algorithm.  10 feels like a good
number to me.  Do you agree?

> > Here's a schematic of GCL's memory layout:
> > .text:  main                          (0x804ad30 in my image)
> > all compiled C files in o/
> > ...
> > user_match                    (0x81870a0 in my image)
> > .data:                        (0x81a53a0 in my image)
> > intermixed typed pages
> > (i.e. cons, string, contblock...)
> > (contblocks hold loaded compiled
> > lisp objects, area malloced by
> > external libs, and, formerly,
> > bignums)
> > heap_end                      (0x8451000 in my image)
> > hole (512 pages)
> > rb_pointer (relblock area)    (0x865102c  in my image)
> > core_end                      (0x871d000  in my image)
> > which corresponds to a 7Mb+ DRS reported by 'ps axvwm'.  What is
> > interesting and beyond myunderstanding at the moment is that the image
> > size on disk is only ~ 4.5Mb -- there must be some sort of compression
> > going on I'd think.
> Actually image size behaves somewhat strange to me.  I just build
> from scratch current CVS gcl and few-days-ago CVS gcl.
> Current unstripped saved_gcl size - 6Mb.
> Previous - 10Mb. Why?

There was a bug in which -g was being added to the CFLAGS in all

On a separate but related note, as much as I like the bfd
quasi-portable relocation scheme, it does take up almost an extra
meg.  Do we want to default to the old sfaslelf on i386 and sparc? Or
does it just not matter?  Should we have a --enable-tiny option which
makes all these space saving moves, maybe even putting -Os onto the
flags, in case someone wants to use a handheld?  I actually know
someone wanting to use axiom on a handheld.

> > In any case, the hole is in the image, allocated in the .data
> > section,
> > but is not among the typed pages in the Lisp heap.  In particular, the
> > hole is not kept in contblock pages.  Hope this helps.  Suggestions
> > for improvements as always most welcome.
> > Take care,
> >
> Finally,  please explain me about this notorious 2.5 Mb.
> This is information from /proc/<pid>/status for
> GCL after start and execution of (gbc t)(gbc nil)(gbc 1)
>                                       RSS      VM     Data
> 1) GCL with small hole size 128     2724K    8012K    732K
> 2) GCL with def hole size 512       2612K   10164K   2884K
> 3) GCL with 1000 hole size          2116K   48116K  40836K
> What is harmful in larger VM and Data size?
> Compare with initial memory layout for cmucl:
>                                       3928K    1.3M   1.3M

OK, you've convinced me that we must truly be in the running for the
tiny lisp cup, though this certainly is not our intention.  A 50M hole
does not appear to be a problem.  I must confess that I don't fully
understand the details about what the kernel physically does with all
these memory ranges.  I've always been told that the RSS was the
important one, and I get 1M, 1.1M, and 5M for GCL/128, GCL/512, and
cmucl respectively.  It is quite possible that the kernel mmaps the
.data section in as a read-only shared map, meaning that the real
memory is actually on disk and not in ram at all.  If it is not
read-only, then perhaps the map is private, in which case the pages
are only mapped in from disk as they are written, e.g. copy-on-write.
Would appreciate enlightenment as well from those in the know.

Take care,

> Maybe I misunderstand some UNIX VM concepts.
> -- 
>       Vadim V. Zhytnikov
>        <address@hidden>
>       <address@hidden>
> _______________________________________________
> Gcl-devel mailing list
> address@hidden
> http://mail.gnu.org/mailman/listinfo/gcl-devel

Camm Maguire                                            address@hidden
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]