emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: MPS: weak hash tables


From: Pip Cet
Subject: Re: MPS: weak hash tables
Date: Tue, 02 Jul 2024 15:03:44 +0000

On Tuesday, July 2nd, 2024 at 13:42, Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Gerd Möllmann gerd.moellmann@gmail.com
> 
> > Cc: pipcet@protonmail.com, eller.helmut@gmail.com, emacs-devel@gnu.org
> > Date: Tue, 02 Jul 2024 15:16:54 +0200
> > 
> > Eli Zaretskii eliz@gnu.org writes:
> > 
> > > Do we even use weak objects in Emacs on this branch? If we do, what
> > > do we use them for?
> > 
> > Yes, Pip's weak hash tables, and vectors for BUF_MARKERS.
> 
> So I guess they will work better in the 32-bit builds than in the
> 64-bit ones? Because AFAIU what that text says is that they
> implemented this feature only in the IA-32 builds, and in the other
> cases the behavior is sub-optimal.

The code needs fixing for IA-32, and will conceivably be a little faster on 
those machines than it is on 64-bit machines. This will probably be noticeable 
only for very large hash tables, but may be noticeable for normal numbers of 
markers in a buffer.

Having looked at the MPS code, and sorry for being verbose:

* all this is relevant to large objects, which MPS's design doesn't really 
favor. For strong references, you scan an entire object or none of it. This is 
a potential problem since Emacs can have very large objects.
* For weak references, it's even worse, because when you scan an entire object 
to satisfy a client request to it (which triggered a protection signal), MPS 
uses a "strong" scan and doesn't splat any references.
* so, only for weak references, and only when certain CPU instructions are 
used, and only on 32-bit x86 machines (there's some code for x86-64 and 
aarch64, as well as macOS, but not enough to actually do anything) on Windows 
and Linux, there's a special optimization which avoids scanning the entire 
object: a machine instruction is simulated.
* Since the point is to avoid calling the scan function for the entire object, 
MPS needs a different algorithm to decide whether the word is a weak (exact) 
reference, or plain old data that doesn't refer to anything.
* They decided to treat all multiple-of-four values (except for 0) as 
references, and all with a remainder when divided by four as non-references. 
That clashes with our usual Lisp_Object convention, but miraculously works out 
for fixnums, whose integer value is congruent to 2 modulo 4. No, I don't know 
why they didn't use a special sub-scan function which takes a range into the 
object (possibly they don't remember object starts at all?)

That means we must do one of the following:

1. mangle all Lisp_Objects to pointers or fixnums when storing them in a weak 
hash table, and unmangle them upon retrieval
2. not use 32-bit x86 machines
3. modify MPS
4. throw caution to the wind and just hope it works

I started implementing (1) (which is why the hashes are now fixnums rather than 
uint32_t), then realized I was doing (2). I looked at MPS and the modifications 
to disable this code, while trivial, cannot be performed without rebuilding the 
library. And, yes, I'm afraid on 32-bit x86 we're currently using option (4), 
which is very annoying.

I'm not at all sure what the right thing to do is in our present situation. I'm 
not opposed to mangling Lisp_Objects, provided we can revise struct igc_header 
to contain the three Emacs tag bits directly. However, performance of large 
weak hash tables will be poor on all machines.

Any advice would be appreciated.

Pip



reply via email to

[Prev in Thread] Current Thread [Next in Thread]