gcl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gcl-devel] Using all physical effectively with multiple processes


From: Matt Kaufmann
Subject: Re: [Gcl-devel] Using all physical effectively with multiple processes
Date: Wed, 6 May 2015 09:08:12 -0500

Hi, Camm --

That looks great to me!  So in short, I could make the following my
shell script for invoking GCL, since according to a few experiments,
MFLAGS is set if and only if we are doing a make with -j n for some
n>1, so we will only get the pool-based behavior in that case and
we'll get the old-style single-process optimized behavior otherwise
(if I understand correctly).

#!/bin/sh
if [ "$MFLAGS" != "" ] ; then
    GCL_MULTIPROCESS_MEMORY_POOL=t
    fi
/p/bin/gcl-2.6.13pre14a "$@"

Does this make sense?

Thanks --
-- Matt
   From: Camm Maguire <address@hidden>
   Cc: Matt Kaufmann <address@hidden>, address@hidden
   Date: Wed, 06 May 2015 09:42:53 -0400
   X-Spam-Status: No, hits=0.2 required=5.0 tests=KAM_LAZY_DOMAIN_SECURITY=0.2,
           
RCVD_IN_DNSWL_NONE=-0.0001,RCVD_IN_MSPIKE_H3=-0.01,RCVD_IN_MSPIKE_WL=-0.01,
           T_HDRS_LCASE=0.01 autolearn=no autolearn_force=no

   Greetings!  This is just a discussion post on where things stand.
   Please feel free to skip whatever you wish, but any feedback is of
   course helpful.

   Bob makes the excellent point that we should design things to make one
   process run as fast as possible, and forget about other jobs as much as
   possible.  Given my experiments thus far, it looks like this approach
   might win out in any case.  Bob, I hope you are pleased by this :-).  

   That said, the attempt to use all of physical ram conflicts openly with
   multiple jobs, so something must be done, even if minimal.  And the
   minimal solution is this environment variable:

   GCL_MEM_MULTIPLE=0.125

   will multiply the physical ram seen by each process by this value.  So
   make -j 8 GCL_MEM_MULTIPLE=0.125 is the logical approach, though one
   might do better by raising the 0.125 somewhat as all jobs won't use all
   that memory anyway.

   On the plus side, each process decides when to start gc independently.
   On the minus side, big jobs will bear a larger gc load then they would
   have to in theory.

   So the other approach is this environment variable:

   GCL_MULTIPROCESS_MEMORY_POOL=t

   which (only) when set, will maintain a shared locked file /tmp/gcl_pool
   containing the summed resident set size of all processes, and use this
   as the value to compare against physical ram when deciding we're full
   enough to start gc.  This is working, and one can see (via top) how big
   jobs are afforded more ram.  Paradoxically, it may or may not improve
   the overall regression time.  We'll know more here soon.

   There are two environment variables which jointly determine the gc
   threshold:

   GCL_GC_PAGE_THRESH (default 0.75)

   means we will not start gc until the data size is at least 0.75 of
   physical ram.  This can be set to 1.0, and perhaps should logically, but
   remember that GCL is constantly calling gcc in a subprocess, and this
   can be a memory hog leading to a swap storm.  Alas, at this point I know
   of no way to manage the memory use of gcc, so this value is a heuristic.

   GCL_GC_ALLOCATION_THRESH (default 0.125)

   means we will not gc until we have allocated (since the last gc) one
   eighth of physical ram.  

   This is an alternative solution to the problem of rebalancing maxpages,
   whereby a job could load up on cons for a long time, leave a tiny array
   allocation, then start allocating arrays when there is no more physical
   ram to expand into.  Recall that the variable
   si::*optimize-maximum-pages* would attempt to collect gc statistics and
   rebalance these maxpage limits based on the actual demand.  This is OK
   as a workaround, but does require you start collecting statistics before
   its 'too late' and you've already allocated most of physical ram.  

   But the real problem is that gc cost is proportional to heap size and
   live heap size, and triggering based on an unrelated quantity
   (suballocation of a given data element size) makes no real sense.
   Earlier in the 2.6.13 series, we found that simply scaling the maxpages
   to physical ram at the outset was a big win, but then again, all we had
   to scale by was the current allocation in the saved image, which makes
   no real sense.

   So in short, when si::*optimize-maximum-pages* is set GCL will now
   ignore maxpage settings as a gc trigger, and use the above thresholds
   instead.  When unset, GCL will use minimal maxpage expansion via its
   traditional algorithm and trigger (frequent) gc when these maxpage
   limits are hit, without any attempt to collect statistics to
   expand/rebalance them.  This mode is to be used when preparing a small
   image to be saved to disk, e.g. acl2 build time.

   My concern is there appear to be too many variables here.  At a minimum,
   we need a 'small image to be saved to disk' mode, and a 'use as much ram
   as possible for speed' mode, and some mechanism to reduce the ram used
   when running multiple jobs.  But in principle the last three environment
   variables could be removed and replaced with constants.

   Takg Version_2_6_13pre14a is build and installed at ut, and undergoing
   testing since last night.  It looks solid so far.

   Thoughts most appreciated.

   Take care, 



   Robert Boyer <address@hidden> writes:

   >> This seems closest in the spirit to sol-gc.
   >
   > As best I can guess, Acl2 is headed towards
   > not using sol-gc in CCL in the 7.1 release of Acl2.
   >
   > It's not my place to speak, and those who know may
   > say that any problem with sol-gc may have been, who really knows, that it 
was
   > using interrupts of the gc and that was too dangerous to do.  Interrupts
   > should scare the crap out of anyone.
   >
   > But Sol's main idea I think was to allocate a hell of a lot of memory, all 
of the memory, for
   > the heap to free space after a gc in order to keep gc costs as low as
   > possible for this one process.  And to hell with any other processes 
except this one.
   >
   > Camm,
   >
   > I think that your objective should be for  j=1 speed and not j=8 at all. 
 The ordinary user almost
   > all of the time is using j=1, and as far as I know, only people like Matt 
regularly use j=8
   > and that only for regression testing before they release a new version of 
Acl2.
   >
   > Just my two cents worth.  I would certainly go with whatever Matt advises,
   > rather than with what I advise.
   >
   > Bob
   >
   > On Mon, May 4, 2015 at 11:51 AM, Camm Maguire <address@hidden> wrote:
   -- 
   Camm Maguire                                     address@hidden
   ==========================================================================
   "The earth is but one country, and mankind its citizens."  --  Baha'u'llah




reply via email to

[Prev in Thread] Current Thread [Next in Thread]