[Gnash-dev] Garbage Collection

gnash-dev
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gnash-dev] Garbage Collection

From:	Chad Musick
Subject:	[Gnash-dev] Garbage Collection
Date:	Wed, 05 Sep 2007 19:50:20 +0900
Hello everyone,

For your consideration.


A more expansive garbage collection proposal --

I think we ought to unify our garbage collection scheme, and I propose
that we use either the attached collector or something similar. This GC
would run concurrently with the rest of the program and require some
basic support from the objects code, as I'll explain. The following
classes are defined:

GC -- the garbage collector itself, a singleton accessed only through
static functions.

GC_runner -- Also a singleton, this is the thread which runs
continuously for garbage collecting.

GcResource -- the base class from which (I think) all objects which
might ever be dynamically allocated ought to derive.

GcInvoke -- Explained in the How To Use section, this is a mechanism to
allow the GC to run concurrently.

ModifyLock -- A typedef to a boost scoped_lock
ModifySync -- a typedef to a boost mutex

Allocation of new memory
========================

1) When a new object is dynamically allocated, its memory address is
added to a list of objects allocated in this thread.  When all objects
which will be allocated at a certain point (defined by GcInvoke and in
the 'How to Use' section) are finished constructing, these objects are
added to the list of resources maintained by the garbage collector.
This ensures that new objects are not recycled before they have a chance
to become reachable.

2) When a new object is added to the list of GcResources, it goes into a
list of new additions.  This may happen from (1), or it may be done
directly -- doing so directly is more expensive if multiple allocations
are being done, as at parsing time.

Marking reachable objects
=========================

1) When the collector wants to mark the reachable objects, it first
moves all objects from the new additions list into its list of managed
memory.

2) Beginning from the roots (see the 'How to Use' section), all
reachable memory is marked.  When an object is seen for a second time in
the marking, it is not re-marked. This avoids loops.

3) While the objects are being marked, new objects may be added to the
Gc, since these will go on the 'new additions' list -- they may be
marked, but they won't be collected.

Removing unreachable objects
============================

1) All objects in the list of managed memory which are not marked as
reachable are removed.

How To Use
==========

Allocating new GcResources
==========================
A GcInvoke object should be on the stack when new GcResource objects are
heap allocated.  The GcInvoke object either 'on' or 'off'. ('on' is the
default). If it is on, the newly allocated memory will be managed by the
Gc. If it is off, the newly allocated memory must be managed by hand.

When all GcInvoke objects have gone out of scope (they may be nested),
any GcResource objects which were heap allocated in the scope of an 'on'
GcInvoke is added to the list of managed objects.

By constructing in this way, the newly allocated resources don't face
the possibility of collection until they have had a chance to be placed
into a reachable state.  With a single-threaded GC this is not a
problem, but since the collection happens concurrently, it otherwise
risks this:
    SomeGcClass *p = new SomeGcClass;
    /* At this point, or perhaps even during the construction of p, the
GC finishes its last run and beings a new run. Since p has been put into
the new list of memory at allocation time (not at the finish of
construction), it is in the memory list but not reachable, so it may be
destroyed during construction, or: */
    /* p is DESTROYED here by the Gc, since it isn't reachable (no root)
*/
    p->do_something();
    /* or maybe p is DESTROYED here by the Gc. */
    this->markable_member = p;
    /* Whew! p is finally safe, right? Only if 'this' is safe and
unmarked in this GC cycle. */
Instead, the following code is used:
  { // Scope.
    GcInvoke GcI;
    SomeGcClass *p = new SomeGcClass;
    p->do_something();
    this->markable_member = p;
    /* It would be safe to manage p from here, if it is safe to manage
'this' */
  } /* GcInvoke falls out of scope. 'p' is as safe as 'this'. Since
GcInvoke objects stack, 'p' is only managed once 'this' is managed. */

A Caveat: It is NOT safe to spawn a new thread inside the scope of a
GcInvoke. (Strictly speaking, any dynamic resources allocated in a
spawned thread must provide for their own safety, and not rely on
'this'.)

Marking GcResources
===================
Objects do not need to be immutable while they are being marked, except
that the _structure_ of any containers which hold GcResource objects
must not be changed while the object is being marked.

There are three functions (with some variants) used to make this
possible, all defined in GcResource:
1) mark(const GcResource*); -- This marks a single contained resource.
It is NOT safe to mark a resource in this way:
  if (p) p->setReachable();
because 'p' may be invalidated between 'if (p)' and 'p->'.

2) markContainer(...); -- This marks all resources in an iterable
container. It requires that a mutex be passed to it, with which it will
lock the container until it has copied a list of the pointers.  This
should be the same mutex that locks any modificactions to the container
structure (see 'Needed Support').

3) markRContainer(...); -- This is the same as markContainer, but the
Container must be locked any time it OR its contents are modified. These
are containers which contain instances of GcResource objects, and not
pointers to them. These may still need to be marked to keep other
resources reachable.

4) markMap(...); -- This is the same as markContainer, but the resources
of a map are stored in a pair. This function just adds '->second' to
find the resource, rather than the pair object.

Running the GC
==============
The garbage collector runs concurrently and self-constructs when the
first manageable memory is allocated.  It may never be destroyed,
although it may have its memory cleared (call 'obliterateResources' --
the name is a warning).

Calling 'collectSoon' will schedule the GC to run the next time that its
thread is awakened.

Calling 'pause' will force the GC to finish its current collection (the
caller waits for this) and not to begin any new collections until
'resume' is called. Maybe useful in special cases, but ordinarily should
not be used.

Calling 'manage' with a GcResource will cause that resource to be
managed -- this might be needed if the resource was allocated in an
'off' GcInvoke and now needs to change its mind.  It is cheaper and
safer to use the GcInvoke structure to manage memory, so this is another
specialty function.

Needed Support
==============

Only heap allocate GcResources within the scope of a GcInvoke object.

Don't spawn threads in the scope of a GcInvoke object.

Call 'touch' on any resources you receive from outside and store in
'this'. E.g.:
void someResourceClass::setMyPtr(GcResource *p) { p->touch(); m_p = p; }

Use a ModifyLock to lock any containers which contain GcResources before
modifying their structure.  Lock them before modifying their contents if
they contain objects and not pointers.

Define a function 'void markResources() const' which uses the 'mark',
and 'mark*' variants to mark all of its contained resources.

Correctness, Aggressiveness of the Algorithm
============================================
The scheme used is essentially a tri-color marking scheme. See
http://www.memorymanagement.org/glossary/t.html#tri-color.marking
In this terminology:
Objects are black until they have been released into the GC (GcInvoke
scheme).

Objects are white when their parity doesn't match that of the GC, or
when their reachable flag is off. The parity avoids a need to keep a
list of all objects. This allows GcResource objects to be stack
allocated when this makes better sense.

Objects are gray after setReachable is called in them.  Since
markResources calls setReachable in all contained resources, the object
is black when this function returns.  In effect, only roots ever remain
gray, since all other gray objects are black by the time that marking
them gray has returned. To maintain the tri-color invariant, no white
sub-resources may be added to a black object.  In order to not violate
the invariant, any added resources are tinted gray by the 'touch'
function. (This function adds them as root objects to the GC to color
them gray, and the GC removes them from the root list once it makes them
black.)

It is possible that some objects which are not reachable will escape
collection for one cycle of collection. This may occur if a resource is
touched, and then subsequently discarded.  Since it was touched, it is a
root until the finish of the GC run, and so it will not be collected on
that run.

Drawbacks of the system
=======================
The biggest drawback is that all objects which store GcResource objects
must themselves be GcResource objects. This could be changed by
introducing a common base class for GcResource objects and 'roots' of
GcResource objects, but I recommend against this.  Since GcResource
objects may be stack allocated under this scheme with minimal overhead,
this is probably not as big a problem as it sounds.

Non-deterministic collection -- because the GC runs concurrently, it is
not possible to know exactly when a given object will be collected.  It
would be possible to write a method to allow explicit deletion, but this
would require one of two things:
linear time deletion of the resource
storage of an iterator in all resources to allow constant-time deletions
In either case, if you really know that it is safe to delete an object,
you should probably either stack allocate it or use the GcInvoke scheme
to manage it yourself from the beginning.

Complexity -- Although I have designed the mechanisms to be as straight
forward as I could, they require discipline in their use.  I would
welcome any suggestions about how to better structure these, as I can
only write them in a way that seems most straightforward to me --
someone else may have a better idea.

Benefits of the system
======================
Safety -- the current system disallows the stack allocation of
resources, but there is no C++ language feature to enforce this.  This
restriction may be violated by a fully ISO-compliant compiler, so this
is a portability issue as well. (Even a private constructor may be
accessed by another member of the same class, and this solution would
require a factory for each type of object.)

Unification -- all garbage collected resources would be managed under a
single system.

Configurability -- This would allow easier control of memory issues.  As
an example, the 'new' operator could be specialized to allow the
run-time limiting of heap size or block allocations for the heap. The GC
could be tested under various scenarios for the optimal times to
collect, since scheduling a collection does not make the caller wait.

Responsiveness -- Because this GC runs concurrently with the rest of
Gnash, it is not necessary to pause the program in order to collect
garbage.  This will result in greater responsivity of the program in
some instances.
GC.h
Description: Text Data
GC.cpp
Description: Text Data
[Prev in Thread]
Current Thread
[Next in Thread]
[Gnash-dev] Garbage Collection, Chad Musick <=
- Re: [Gnash-dev] Garbage Collection, strk, 2007/09/19
Prev by Date: Re: [Gnash-dev] Assigning to htmlText of TextField does not work
Next by Date: Re: [Gnash-dev] Assigning to htmlText of TextField does not work
Previous by thread: [Gnash-dev] Many thanks for gnash
Next by thread: Re: [Gnash-dev] Garbage Collection
Index(es):
- Date
- Thread