Re: Merging scratch/no-purespace to remove unexec and purespace

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Merging scratch/no-purespace to remove unexec and purespace

From:	Pip Cet
Subject:	Re: Merging scratch/no-purespace to remove unexec and purespace
Date:	Sun, 22 Dec 2024 13:13:50 +0000

Pip Cet <pipcet@protonmail.com> writes:

> However, I realize that (1) is currently a sheer guess. I haven't
> decided whether it's worth it to get an upper bound on the saved GC time
> by implementing a universal "tenured" set and performing a GC right
> after loading (which should be very fast, not marking any pdumped
> objects).

I did.  This got long again.  That's because I wanted to be really sure
that merging no-purespace isn't going to prevent worthwhile
optimizations in the future, and I am now.  Feel free to skip the rest
:-)

My initial results are that simply "tenuring" the char tables in the
pdump seems to have such a drastic effect that it's hard to perform a
fair measurement: process_mark_stack is called (in emacs -Q, no --batch)
21384 times if we "tenure" the char tables, and 135345 times if we
don't.

(This suggests that char tables may be worth optimizing for the "old"
GC: simply keep a set of GC-relevant values in the char table, and scan
that rather than scanning the entire char table.  However, we can't do
that with MPS, so I'm not overly interested in it.  Also, I doubt the
optimization decisions required for char tables would be made the same
way if they were reimplemented today, so it may be more productive to
start over from scratch, with a particular focus on reducing the time
needed for GC rather than ordinary performance)

Also, we need to add a few check_writable calls to avoid segfaults.  I
should have expected that, I guess.

The good news is that few pdumped objects (256 once a non-batched Emacs
is started) actually appear to be written to, so it's not entirely
hopeless to identify those in one run and mark them non-tenured in the
real Emacs.

IOW, my tentative conclusion is that it's possible to perform such
optimizations after pure space is dropped, and there's no reason to
delay the merge.

Optimizing based on a *hint* that an object probably won't be mutated is
a potential way forward.

Optimizing based on a hard promise that an object won't be mutated, as
the old purespace code does, not so much.  Even the old purespace code,
with the years of development it's seen, ended up losing the
optimization and causing preventable segfaults for valid-looking Elisp
code.

I must confess I'm fundamentally opposed to having objects come in a
"read-only" and a "read-write" flavor.  Either they should always be
immutable, such as bignums and floats are now, or we should go to the
trouble of supporting the rare cases in which an object hinted or
guessed to be read-only turned out not to be.  (This is independent of
the question of whether the characters in a string can be changed or
not.)

It's very hard even to define what constitutes mutation of an object and
what doesn't.  Setting a symbol's global value is clearly a mutation in
the current code, but what if we keep those global values in a hash
table instead, and the struct Lisp_Symbol is never written to?  Does
lexically (or dynamically) binding a symbol mean the entire symbol is no
longer read-only?  If we ever implement hash-collision workarounds by
randomizing hash seeds, would re-seeding count as a mutation of the hash
table?  What about (aset v 0 (aref v 0))? Hash table resizing? Removing
dead keys from Weak hash tables? Pinning a string to use it in a byte
code object?  Wouldn't it make sense to protect hash table (or obarray)
keys from mutation if that may result in irretrievable entries?

Most of these questions have two good answers, one which aids in
optimization, and one which Lisp programmers would expect.  They're
often different.

To get back to the no-purespace branch, I think we should consider
reintroducing check_writable () calls (which would currently be no-ops
on the master branch) after the merge, if we can agree on precisely when
this macro should be called and how.  The old locations of CHECK_IMPURE
can serve as a hint, but no more, so let's drop CHECK_IMPURE first and
start with a clean slate there.

Pip

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Merging scratch/no-purespace to remove unexec and purespace, (continued)
- Re: Merging scratch/no-purespace to remove unexec and purespace, Po Lu, 2024/12/17
  - Re: Merging scratch/no-purespace to remove unexec and purespace, Stefan Kangas, 2024/12/17
  - Re: Merging scratch/no-purespace to remove unexec and purespace, Stefan Monnier, 2024/12/18

Prev by Date: Re: [RFC] The best way to choose an "action" at point: context-menu-mode, transient, which-key or embark?
Next by Date: Re: Merging scratch/no-purespace to remove unexec and purespace
Previous by thread: Re: Merging scratch/no-purespace to remove unexec and purespace
Next by thread: Re: Merging scratch/no-purespace to remove unexec and purespace
Index(es):
- Date
- Thread