[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gcl-devel] GCL memory allocation and GC problems

From: Vadim V. Zhytnikov
Subject: Re: [Gcl-devel] GCL memory allocation and GC problems
Date: Mon, 19 Jan 2004 23:53:41 +0300
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; ru-RU; rv:1.5) Gecko/20031006

Camm Maguire ?????:

"Vadim V. Zhytnikov" <address@hidden> writes:

Let me clarify the misunderstanding.  When I first run fresh
gcl build with new memory layout (faster grow, larger hole)
I noticed that right after start (room) shows more allocated contiguous
pages than GCL permanently installed on my system.  My mistake is
that I decided that this is some suspicious by-product of new memory
parameters.  In fact my permanent GCL is quite old - September 2003.
Recent CVS GCL (before new memory parameters) shows exactly the same
contiguous pages number.  So the difference has nothing to do with
recent changes.  But noticeable difference between current CVS and one
of September really exist.  It can be seen ether right after start
and in the way how GCL allocates large numbers of cont blocks.
I've tried simple test - create 20000 element list of large random
bignums with (set-gmp-allocate-relocatables nil).   The resulting
numbers with September's GCL are
  465/676      16661 contiguous blocks
while the same test with current GCL gives
  796/1152     24284 contiguous blocks
Strange difference.  But let's leave this problem for a moment.
It is not crucially important since now we are trying to avoid
cont blocks allocation as much as possible.

GCL used to mark all contiguous pages as t_other on saving the system,
after a gbc.  I removed this as it appeared very wasteful, especially
with systems such as axiom which go through several preliminary saves
in the build process. (i.e. the contiguous pages even when in use were
never full, denying future modest mallocs the residual memory without
allocating new pages.)  So in brief, the pages are still in the image,
just not reported by (room) as they are t_other.  While wasteful in
terms of space, this might have been put in to save time in the
computationally expensive algorithms manipulating the contiguous
blocks.  I have not measured the effect myself, but I feel a better
approach is to improve the algorithm and/or use non-contiguous pages
wherever possible.
Here is the snippet (from gbc.c)

/*     for (i = 0;  i < maxpage;  i++) */
/*       if ((enum type)type_map[i] == t_contiguous) */
/*      type_map[i] = (char)t_other; */
/*     cb_pointer = NULL; */
/*     maxcbpage -= ncbpage; */
/*     if (maxcbpage < 100) */
/*       maxcbpage = 100; */
/*     ncbpage = 0; */
/*     ncb = 0; */
BTW, does anyone know of optimal malloc/free algorithms? I can look
in libc if/when time permits.  Right now we just have a linked list
of free blocks ordered by size, which is traversed from the beginning
and stops at the first block of the requisite size on allocating and
deallocating blocks:

insert_contblock(char *p, int s) {

  struct contblock **cbpp, *cbp;
/* SGC cont pages: This used to return when s<CBMINSIZE, but we need
     to be able to sweep small (e.g. bignum) contblocks.  FIXME:
     should never be called with s<=0 to begin with.  CM 20030827*/
  if (s<=0)
  cbp = (struct contblock *)p;
  /* SGC cont pages: allocated sizes may not be zero mod CPTR_SIZE,
     e.g. string fillp, but alloc_contblock rounded up the allocation
     like this, which we follow here.  CM 20030827 */
  cbp->cb_size = ROUND_UP_PTR_CONT(s);
  for (cbpp = &cb_pointer;  *cbpp;  cbpp = &((*cbpp)->cb_link))
    if ((*cbpp)->cb_size >= s) {
      if (*cbpp==cbp) {
        fprintf(stderr,"Trying to install a circle at %p\n",cbp);
if (sgc_enabled) overlap_check(old_cb_pointer,cb_pointer);
      cbp->cb_link = *cbpp;
      *cbpp = cbp;
if (sgc_enabled) overlap_check(old_cb_pointer,cb_pointer);
  cbp->cb_link = NULL;
  *cbpp = cbp;
if (sgc_enabled) overlap_check(old_cb_pointer,cb_pointer);


If I recall, this was the rate limiting step on Vadim's earlier
benchmarks using ratsimp with contiguous page bignums.

Is it worth to increase hole size?  Judge for yourself.
Do you remember this (pass) test which we used for
cons pages allocation?  I've tried the same test on several
CL implementation on Athlon XP+ 2400.  These numbers is the
time elapsed by 10 (pass) calls:

clisp                                               -  12 sec
cmucl                                               -  12 sec
sbcl                                                -   8 sec

Anyone know what sbcl is doing differently here?

GCL current CVS with new growth and hole size 512   - 166 sec
The same but with old default hole size 128 (?)     - 785 sec
The same with 1024 hole size                        -  83 sec
The same but with 10000 hole size                   -  13 sec

I think that 785 seconds is really too much.

OK, so what?  Are these numbers _really_ important? Maybe
2.5 Mb is more important? Obviously there is no universal answer
this is usual space/speed tradeoff.  Let's consider the situation
from practical point of view.  Main GCL customers Maxima, ACL2
and Axiom are large Lisp programs which are often used
for large computations.  Of course they can be used also
for something very simple but even in such situation is extra 2.5 Mb
important?  In general I don't think so taking into account typical
modern and not-so-modern hardware.  I could only imagine something
like hand-held computer where this extra RAM may be of real
importance.  On the other hand I understand that large
default hole size cuts some flexibility.  So let's make initial
hole size a new configure option (It is not so hard, isn't it?)
with the default value 4*MAXPAGE/1024 or even 8*MAXPAGE/1024.

OK, I'm going to do this, but the default needs to be MAXPAGE/10 to be
competitive in your results above.  The way we should look at it is
how many core expansions do we think are reasonable in allocating the
full MAXPAGE amount of memory?  The larger the hole, the fewer the
steps, and the faster, though some granularity issues arise in the
last step, at least with the current algorithm.  10 feels like a good
number to me.  Do you agree?

Well, notice that experimental value (really purely experimental I have not any idea how to estimate optimal holesize theoretically)
of 10000 pages for hole is absolute value, note relative to
MAXPAGE.  Since now holesize is proportional MAXPAGE one can get
larger holesize by building gcl with larger MAXPAGES.
BTW, maybe we could change default MAXPAGE to some larger
value. As far as I see there is nothing terrible wrong with it.
Current value 128*1024 - 512Mb is based on typical RAM for
modern PC.  As for hole size as some fraction of MAXPAGE.
Too big holesize compared to maxpages is not good either.
Do you remember some my other tests with very large hole
size (almost whole MAXPAGES) - gcl just stops allocating new
cell.  I don't understand precisely why - probably it is
related to the way how GCL extends hole if required
(there is funny formula in alloc.c which counts available
pages).  After some thinking I suppose that value 64*MAXPAGE/1024
may be a good choice.

Here's a schematic of GCL's memory layout:
.text:  main                          (0x804ad30 in my image)
all compiled C files in o/
user_match                    (0x81870a0 in my image)
.data:                        (0x81a53a0 in my image)
intermixed typed pages
(i.e. cons, string, contblock...)
(contblocks hold loaded compiled
lisp objects, area malloced by
external libs, and, formerly,
heap_end                      (0x8451000 in my image)
hole (512 pages)
rb_pointer (relblock area)    (0x865102c  in my image)
core_end                      (0x871d000  in my image)
which corresponds to a 7Mb+ DRS reported by 'ps axvwm'.  What is
interesting and beyond myunderstanding at the moment is that the image
size on disk is only ~ 4.5Mb -- there must be some sort of compression
going on I'd think.

Actually image size behaves somewhat strange to me.  I just build
from scratch current CVS gcl and few-days-ago CVS gcl.
Current unstripped saved_gcl size - 6Mb.
Previous - 10Mb. Why?

There was a bug in which -g was being added to the CFLAGS in all
On a separate but related note, as much as I like the bfd
quasi-portable relocation scheme, it does take up almost an extra
meg.  Do we want to default to the old sfaslelf on i386 and sparc? Or
does it just not matter?  Should we have a --enable-tiny option which
makes all these space saving moves, maybe even putting -Os onto the
flags, in case someone wants to use a handheld?  I actually know
someone wanting to use axiom on a handheld.

In any case, the hole is in the image, allocated in the .data
but is not among the typed pages in the Lisp heap.  In particular, the
hole is not kept in contblock pages.  Hope this helps.  Suggestions
for improvements as always most welcome.
Take care,

Finally,  please explain me about this notorious 2.5 Mb.
This is information from /proc/<pid>/status for
GCL after start and execution of (gbc t)(gbc nil)(gbc 1)

                                     RSS      VM     Data

1) GCL with small hole size 128     2724K    8012K    732K
2) GCL with def hole size 512       2612K   10164K   2884K
3) GCL with 1000 hole size          2116K   48116K  40836K

What is harmful in larger VM and Data size?
Compare with initial memory layout for cmucl:

                                     3928K    1.3M   1.3M

OK, you've convinced me that we must truly be in the running for the
tiny lisp cup, though this certainly is not our intention.  A 50M hole
does not appear to be a problem.  I must confess that I don't fully
understand the details about what the kernel physically does with all
these memory ranges.  I've always been told that the RSS was the
important one, and I get 1M, 1.1M, and 5M for GCL/128, GCL/512, and
cmucl respectively.  It is quite possible that the kernel mmaps the
.data section in as a read-only shared map, meaning that the real
memory is actually on disk and not in ram at all.  If it is not
read-only, then perhaps the map is private, in which case the pages
are only mapped in from disk as they are written, e.g. copy-on-write.
Would appreciate enlightenment as well from those in the know.

Probably you already guessed that I meant 1.3Gb 1.3Gb for last cmucl examples here. And unix's free shows no any swap usage with running cmucl on my linux box with 512Mb of physical RAM. So, really only RSS matters.

     Vadim V. Zhytnikov


reply via email to

[Prev in Thread] Current Thread [Next in Thread]