emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Some experience with the igc branch


From: Pip Cet
Subject: Re: Some experience with the igc branch
Date: Fri, 27 Dec 2024 14:34:22 +0000

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Thu, 26 Dec 2024 15:24:14 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: gerd.moellmann@gmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, 
>> eller.helmut@gmail.com, acorallo@gnu.org
>>
>> "Eli Zaretskii" <eliz@gnu.org> writes:
>>
>> >> Date: Wed, 25 Dec 2024 17:40:42 +0000
>> >> From: Pip Cet <pipcet@protonmail.com>
>> >> Cc: Eli Zaretskii <eliz@gnu.org>, ofv@wanadoo.es, emacs-devel@gnu.org, 
>> >> eller.helmut@gmail.com, acorallo@gnu.org
>> >>
>> >> I haven't seen a technical argument against using separate stacks for
>> >> MPS and signals
>> >
>> > You haven't actually presented it.
>>
>> That's correct: we have an idea and a PoC, no design to discuss or
>> anything close to a proposal, at this point.
>>
>> My idea was to ask for obvious problems precluding or complicating this
>> approach.
>
> OK, but still, since you wrote the code to implement it, I guess you
> have at least some initial design ideas?  I hoped you could describe
> those ideas, so we could better understand what you have in mind, and
> provide a more useful feedback about possible problems, if any, with
> those ideas.

The idea is that the main thread, after initialization, never calls into
MPS itself.

Instead, we create an allocation thread, reacting to messages from the
main thread.

The allocation thread never actually does anything in parallel with the
main thread: its purpose is to provide a separate stack, not
parallelization.

All redirected MPS calls wait synchronously for the allocation thread to
respond.

This includes the MPS SIGSEGV handler, which calls into MPS, so it must
be directed to another thread.

All this makes the previously fast allocation path very slow, and we
need a workaround for that:

We ensure that we allocate at least 1MB (magic number here) at a time,
then split the area into MPS objects when we need to.  The assumption
that we can split MPS allocations is significant but justifiable,
because MPS will be in the same state after two successful back-to-back
allocations and a single allocation combining the two.

dflt_skip must never lie to MPS about the size of an object, though.
Once dflt_skip told MPS how to skip it (i.e. how large the object is),
we can no longer split that object.  It is another significant but
justifiable assumption that this happens rarely enough.

> In general, as I wrote earlier, there's nothing problematic with
> adding a C thread to Emacs.  But since (AFAIU) the suggestion is to
> run MPS from that thread, I think we should understand in more detail
> how can GC be run from a separate thread.  I expect that to have at
> least some impact on the Emacs code elsewhere, since the original
> Emacs design assumed that GC runs synchronously, and the rest of the
> Lisp machine is stopped while it does.

Thanks for explaining.

I don't think that's a new problem (when comparing the allocation tread
code to scratch/igc), as the allocation thread does not trigger GC any
more spontaneously than the main thread would.  The spontaneous garbage
collection you're worried about can be triggered by another thread
allocating memory while the main thread is busy inspecting it, but the
allocatiion thread only allocates memory while the main thread is
waiting, so this cannot happen.

It's safe to assume no MPS collection happens when:

1. there is no other thread which might trigger a memory barrier (the
allocation thread doesn't)
2. there is no other thread which might allocate memory (the allocation
thread cannot do so while the main thread is in a critical section)
3. we don't allocate memory
4. we don't trigger memory barriers

In practice, (4) is very hard to guarantee, so it might be easier to
decide now that code should always be written to assume spontaneous GC
is possible no matter where we are, which is the third step to actually
enabling fully concurrent GC.  Once we have made that decision,
we can actually test whether it breaks things to trigger spontaneous GCs
from another thread (I've experimented with this, and IIRC I fixed a bug
this uncovered, but only because that bug could also have occurred
without spontaneous GC).  Once we've done that, we can seriously
consider whether spontaneous GCs might be good for performance or
usability rather than debugging.

>> I've found a few minor things; so far, nothing unfixable, and no
>> significant effects on performance, but the fixes will have to become
>> part of the design and discussion.
>
> Right, so I'm asking to describe these aspects, so that others could
> consider them and possibly additional issues, and provide feedback or
> raise concerns about that.

The main aspect is "dflt_skip must never lie, but it can delay deciding
what the truth is until it's called".  We keep eating ice cream until
we're asked how much we've had, at which point we answer truthfully and
stop eating.

>> I think rr (time-travel/reverse debugging with acceptable performance)
>> support is important, but I think I'm the only one? It seems to be
>> really slow on this branch, though I don't know how fast it is on
>> scratch/igc.
>
> Well, reverse debugging currently doesn't work on Windows, so at least
> for that platform we cannot rely on that.

Considering that rr is in a kind of arms race anyway (rseqs, hardware
lock elision, the E-core/P-core split, and spectre workarounds all broke
rr when introduced, and require workarounds, and CPU manufacturers
appear to be fundamentally unable to agree on when an instruction should
be counted as "retired"), relying on it may be a bad idea.

Unfortunately, qemu doesn't seem to be seeing very active development in
this area, so the qemu instruction counting mechanism is unlikely to
provide usable reverse debugging in many situations.  And plain old GDB
reverse debugging is unbearably slow (and has been for decades), AFAIK.

So it's not a safe bet that we will continue to have usable reverse
debugging.  It may become even more necessary to have the compiler or
the CPU assist in providing it.

BTW, speaking of debugging, there's always a nuclear option for signals:
Use ptrace from another process to step through MPS and resend the
signal at the first possible moment.  Breaks GDB, hard to fix.

Pip




reply via email to

[Prev in Thread] Current Thread [Next in Thread]