Re: New GC concept

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New GC concept

From:	Daniel Colascione
Subject:	Re: New GC concept
Date:	Fri, 4 Jun 2021 02:47:32 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1

On 6/4/21 1:00 AM, Daniel Mendler wrote:

Interesting, thank you for working on this!

I had hoped that a new GC would surface at some point given the recent
improvements regarding native compilation. As you say this can bring
Emacs on par with modern managed language runtimes. Can you elaborate a
bit more of some of your concepts?


Thanks for taking a look.

* fully copying and compacting

How do you ensure that compaction works together with the conservative
stack scanning? You pin memory blocks, which are potentially referenced
by the stack?

Yes, pinning is how we combine conservative stack scanning with acopying collector. We don't pin whole memory blocks though. We pin atobject granularity. (Things like Hosking's "mostly copying collector"use block pinning IIRC, but we can do much better these days.)

Just as each object has a mark bit, each object has a pin bit. We pinonly those specific objects that conservative scanning flags aspotentially referenced from native code. We still copy pinned objectsfrom the from-space to the to-space actually --- it's important copypinned objects because it's during copying that we update all thepointers that a pinned object might contain. Pinning just ensures thatwe copy in a specific way such that after we're done with GC and swapthe to-space and from-space, each pinned object ends up at the samevirtual address it had before GC started. This way, although we *do*copy pinned objects, the mutator never observes a pinned object changingposition.

The pin bits end up using very little memory because they're storedcontiguously in side arrays and almost entirely zero, and each zero pageshares the same backing RAM until something makes it non-zero. Like Imentioned in the new alloc.c, address space is abundant.

* generational

Do you rely on the OS memory protection as a write barrier to separate
the different generations, similar to how the bdwgc does that?

Correct. IMHO, it's not practical to retrofit write barriers or readbarriers into Emacs.

* specialized GC spaces for conses, strings, arrays, and so on: no

stupid header word for cons cells bloating memory use by 50%!

Avoiding the headers is a good idea. You are using a first fit strategy
to find the next free space for a new object. How do you use the
headerless approach for objects of different sizes?


We don't. :-)

In the new GC, the overall Emacs heap is divided into "heaps" forvarious object types; each heap has its own list of blocks and its ownheap memory format. The heaps for fixed-size objects like cons cells andintervals don't have headers. The heaps for variable-sized objects likestrings and vectorlikes *do* use conventional object headers.

Isn't it the case
that every block should then contain objects of only a single type and
of a single size?

Some heaps (most importantly, the vectorlike heap) do supportvariable-sized objects, and blocks belonging to these heap types containa mixture of object types.

  Probably most of the objects fall in a few size
classes, so it may be possible to do better than first fit?

First-fit is better than it sounds in the context of a compactingcollector. First-fit allocation always (except in two cases describedbelow) succeeds on the first try: because each GC pass compacts all theobjects at the "start" of the heap, and we start first-fit allocationfrom the end of the last compacted object. That's why I wrote that thefirst-fit allocation scheme is equivalent in practice to bump-pointerallocation.


The two cases where we fail first-fit allocation are:

1) we're in a heap that supports variable-sized objects and there's notenough space in the current block to hold the object we're allocating, and

2) there's a pinned object "in the way" of where we want to place theobject via first-fit allocation.

#1 isn't a problem in practice: if we're trying to allocate an objectthat's too big to place in the tail of the object's heap's currentblock, we allocate a new block and put the new object there instead. Thenew object is guaranteed to fit in the new block because we allocatelarger-than-block objects in a separate storage area, as is traditionalin GCs of this sort. (See the large vector list.) When we move to a newblock this way, we don't commit the memory of the tail of the previousblock, so moving to the next block is practically free, modulo page-tailwastage.

#2 isn't a problem either: pinned objects are rare, and when we doencounter one, we can "skip over" it efficiently using the free-objectbitmap. Modern machines are really good at streaming analysis of bitarrays: we don't even need a freelist embedded in the heap, like Emacscurrently has for conses. Scanning a bitmap is both simpler and kinderto the cache. Because pinned objects are rare, because pins aretransient, and because each GC pass is a compacting pass, first-fitdoesn't lead to the fragmentation that it normally causes in things likemalloc implementations.

Overall is your design similar to the approach of the bdwgc plus that a
memory/object layout tailored to the needs of Emacs and the compaction?
How well does such a GC hold up to a GC which is precise and does not
rely on the OS facilities for barriers? It appears such a precise GC is
impossible to retrofit on the existing Elisp runtime, so I assume your
approach is the right way to go.

Most other GCs use software barriers, true. But even that's changing.Relying on OS facilities for barriers has an important advantage: itreduces code size. If we used the non-OS-level facility in a nativecompilation world, we'd have to emit a write barrier before *every*mutator write (~6 instructions). These barriers add up and bloat thegenerated code. If we use OS memory protection instead, the generatedcode can be a lot smaller. Plus, using OS facilities, we don't have tochange the rest of the Emacs C core.

[Prev in Thread]

Current Thread

[Next in Thread]

New GC concept, Daniel Colascione, 2021/06/03
- Re: New GC concept, Daniel Mendler, 2021/06/04
  - Re: New GC concept, Daniel Colascione <=
    - Re: New GC concept, Eli Zaretskii, 2021/06/04
    - Re: New GC concept, Fejfighter, 2021/06/21
    - Re: New GC concept, Eli Zaretskii, 2021/06/21
    - Re: New GC concept, Daniel Colascione, 2021/06/21
    - Re: New GC concept, Daniel Mendler, 2021/06/04
- Re: New GC concept, Andrea Corallo, 2021/06/04
- Re: New GC concept, Matt Armstrong, 2021/06/07
  - Re: New GC concept, Daniel Colascione, 2021/06/07
    - Re: New GC concept, Daniele Nicolodi, 2021/06/07
    - Re: New GC concept, Eli Zaretskii, 2021/06/07

Prev by Date: Re: New GC concept
Next by Date: Re: cc-mode fontification feels random
Previous by thread: Re: New GC concept
Next by thread: Re: New GC concept
Index(es):
- Date
- Thread