bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#68244: hash-table improvements


From: Dmitry Gutov
Subject: bug#68244: hash-table improvements
Date: Fri, 5 Jan 2024 17:41:47 +0200
User-agent: Mozilla Thunderbird

On 05/01/2024 12:33, Mattias Engdegård wrote:
[ replies to questions asked elsewhere ]

4 jan. 2024 kl. 18.32 skrev Dmitry Gutov <dmitry@gutov.dev>:

+hash_table_alloc_bytes (ptrdiff_t nbytes)
+{
+  if (nbytes == 0)
+    return NULL;
+  tally_consing (nbytes);
+  return xmalloc (nbytes);
+}

Sorry if it's a stupid question, but if the operation doesn't add any Lisp 
"garbage", why increase the consing counter? That is likely triggers more GCs 
earlier which otherwise might not run, and if there are no slots to GC, it seems like 
they would run in vain.

That's a good question and it all comes down to how we interpret 
`consing_until_gc`. Here we take the view that it should encompass all parts of 
an allocation and this seems to be consistent with existing code.

But the existing code used objects that would need to be collected by GC, right? And the new one, seemingly, does not.

So I don't quite see the advantage of increasing consing_until_gc then. It's like the difference between creating new strings and inserting strings into a buffer: new memory is used either way, but the latter doesn't increase consing.

These ancillary allocations are parts of the hash-table object: they are 
allocated and freed at the same time as the mother object, and subject to the 
same `consing_until_gc` accounting.

The new code is radically more efficient when growing tables, because we can 
free the old vectors immediately (and adjust consing_until_gc accordingly) 
instead of leaving it as garbage waiting to be collected. This provides several 
benefits in itself (GCs are made less often, we can re-use hot memory). Not 
having to traverse them in either the mark or sweep phases is another big gain 
(the key_and_value parts still have to be marked, of course).

It's great that the new hash tables are garbage-collected more easily and produce less garbage overall, but in a real program any GC cycle will have to traverse the other data structures anyway. So we might be leaving free performance gains on the table when we induce GC cycles while no managed allocations are done. I could be missing something, of course.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]