Let's make the GC safe and iterative (Was: Re: bug#30626)

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Let's make the GC safe and iterative (Was: Re: bug#30626)

From:	Daniel Colascione
Subject:	Let's make the GC safe and iterative (Was: Re: bug#30626)
Date:	Thu, 1 Mar 2018 15:22:39 -0800
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0

Noam mentioned that I should make a new thread for this proposal, so I'mposting an edited version of my original message.

tl;dr: we should be able to make the GC non-recursive with minimaloverhead, solving the "Emacs crashed because we ran out of stack spacein GC" problem once and for all.

On 02/27/2018 10:08 AM, Eli Zaretskii wrote:

What can we do instead in such cases?  Stack-overflow protection
cannot work in GC, so you are shooting yourself in the foot by
creating such large recursive structures.  By the time we get to GC,
where the problem will happen, it's too late, because the memory was
already allocated.

Does anyone has a reasonable idea for avoiding the crash in such
programs?

We need to fix GC being deeply recursive once and for all. Tweakingstack sizes on various platforms and trying to spot-fix GC for theoccasional deeply recursive structure is annoying. Here's my proposal:

I. NAIVE APPROACH

Turn garbage_collect_1 into a queue-draining loop, initializing theobject queue with the GC roots before draining it. We'll makemark_object put an object on this queue, turning the existingmark_object code into a mark_queued_object function.

garbage_collect_1 will just call mark_queued_object in a loop;mark_queued_object can call mark_object, but since mark_object justenqueues an object and doesn't recurse, we can't exhaust the stack withdeep object graphs. (We'll repurpose the mark bit to mean that theobject is on the to-mark queue; by the time we fully drain the queue,just before we sweep, the mark bit will have the same meaning it does now.)

We can't allocate memory to hold the queue during GC, so we'll have topre-allocate it. We can implement the queue as a list of queue blocks,where each queue block is an array of 16k or so Lisp_Objects. Duringallocation, we'll just make sure we have one Lisp_Object queue-blockslot for every non-self-representing Lisp object we allocate.

Since we know that we'll have enough queue blocks for the worst GC case,we can have mark_object pull queue blocks from a free list, aborting iffor some reason it ever runs out of queue blocks. (The previousparagraph guarantees we won't.) garbage_collect_1 will churn throughthese heap blocks and place each back on the free list after it's calledmark_queued_object on every Lisp_Object in the queue block.

In this way, in non-pathological cases of GC, we'll end up using thesame few queue blocks over and over. That's a nice optimization, becausewe can MADV_DONTNEED unused queue blocks so the OS doesn't actually haveto remember their contents.

In this way, I think we can make the current GC model recursion-proofwithout drastically changing how we allocate Lisp objects. Theadditional memory requirements should be modest: it's basically oneLisp_Object per Lisp object allocated.

II. ELABORATION

The naive version of this scheme needs about 4.6MB of overhead on mycurrent 20MB Emacs heap, but it should be possible to reduce theoverhead significantly by taking advantage of the block allocation we dofor conses and other types --- we can put whole blocks on the queueinstead of pointers to individual block parts, so we can get away with amuch smaller queue.

It's also interesting to note that we don't need separate queue blocksto put a block on the queue, as we do if we want to enqueue individualLisp_Object pointers. Instead, we can add to each block type a pointerto the next block *on the to-be-marked queue* and a bitmask yielding thepositions within that block that we want to mark.

For example, cons_block right now looks like this:

struct cons_block
{
  /* Place `conses' at the beginning, to ease up CONS_INDEX's job.  */
  struct Lisp_Cons conses[CONS_BLOCK_SIZE];
  bits_word gcmarkbits[1 + CONS_BLOCK_SIZE / BITS_PER_BITS_WORD];
  struct cons_block *next;
};

We'd turn it into something like this:

struct cons_block
{
  /* Place `conses' at the beginning, to ease up CONS_INDEX's job.  */
  struct Lisp_Cons conses[CONS_BLOCK_SIZE];
  bits_word gcmarkbits[1 + CONS_BLOCK_SIZE / BITS_PER_BITS_WORD];
  bits_word scan_pending[1 + CONS_BLOCK_SIZE / BITS_PER_BITS_WORD];
  struct cons_block *next;
  struct cons_block *next_scan_pending;
};

When we call mark_object on a cons, we'll look up its cons_block andlook up the cons in gcmarkbits. If we find the cons mark bit set, we'redone. Otherwise, we look at the scan_pending bit for the cons cell. If_that's_ set, we're also done. If we find the scan_pending bit unset,however, we set it, and then look at next_scan_pending. If that'snon-zero, we know the block as a whole is enqueued for scanning, andwe're done. If *that's* zero, then we add the whole block to theto-be-scanned queue.

We'll modify garbage_collect_1 to drain both the Lisp_Object queue Idescribed in the last section (which we still need for big objects likebuffers) *and* the queue of blocks pending scanning. When we get a consblock, we'll scan all the conses with scan_pending bits set to one, settheir gcmarkbits, and remove the cons block from the queue.

That same cons block might make it back onto the queue later if someonecalls mark_object for one if its conses we didn't already scan, butthat's okay. Scanning scan_pending should be very cheap, especially onmodern CPUs with bit-prefix-scan instructions.

Under this approach, the reserved-queue-block scheme would impose anoverhead of somewhere around 1MB on the same heap. (I think it'dactually be a bit smaller actually.) Conses, strings, and vectors arethe overwhelming majority of heap-allocated objects, and thanks to blockpacking, we'd get bookkeeping for them for practically free. This amountof overhead seems reasonable. I think we may end up actually using lessmemory that we would for recursive mark_object stack invocation.

This scheme interacts well with the portable dumper too. pdumper alreadyuses a big bit array to store mark bits; we'd just add another array forits scan_pending. We'd basically treat the entire pdumper region as onebig cons_block for GC purposes.

What do you think? I think this approach solves a longstanding fiddlyproblem with Emacs GC without too much disruption to the internals. Italso paves the way for concurrent or generational GC if we ever want toimplement these features.

[Prev in Thread]

Current Thread

[Next in Thread]

Let's make the GC safe and iterative (Was: Re: bug#30626), Daniel Colascione <=
- Re: Let's make the GC safe and iterative (Was: Re: bug#30626), Paul Eggert, 2018/03/01
  - Re: Let's make the GC safe and iterative (Was: Re: bug#30626), Ken Raeburn, 2018/03/05
    - What improvements would be truly useful?, Richard Stallman, 2018/03/05
    - Re: What improvements would be truly useful?, John Yates, 2018/03/05
    - Re: What improvements would be truly useful?, Paul Eggert, 2018/03/05
    - Re: What improvements would be truly useful?, Stefan Monnier, 2018/03/05
    - Re: What improvements would be truly useful?, Rostislav Svoboda, 2018/03/05
    - Re: What improvements would be truly useful?, Eli Zaretskii, 2018/03/05
    - Re: What improvements would be truly useful?, Daniel Colascione, 2018/03/05
    - Re: What improvements would be truly useful?, Eli Zaretskii, 2018/03/05

Prev by Date: Re: Let's make C-M-w in isearch yank symbol, not delete character
Next by Date: Re: Let's make the GC safe and iterative (Was: Re: bug#30626)
Previous by thread: Re: Let's make C-M-w in isearch yank symbol, not delete character
Next by thread: Re: Let's make the GC safe and iterative (Was: Re: bug#30626)
Index(es):
- Date
- Thread