bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#52753: 29.0.50; Printing long list-like structures fails


From: Mattias Engdegård
Subject: bug#52753: 29.0.50; Printing long list-like structures fails
Date: Fri, 24 Dec 2021 17:54:45 +0100

24 dec. 2021 kl. 11:56 skrev Ihor Radchenko <yantar92@gmail.com>:

> I understand that the current implementations of the read/prin1 are
> recursive. Yet, they are able to read ordinary lists in non-recursive
> way.

Serves me right for not being precise enough! The Emacs lisp printer and reader 
are recursive in general, but no recursion is required for the standard list 
structure with links in the 'cdr' of each element. They do use recursion for 
reading and printing structures in the 'car' field, however. Consider:

(defun pearl (n)
  (let ((x 'grain-of-sand))
    (dotimes (_ n)
      (setq x (list x)))
    x))

If you try to print (pearl 10000) you probably get a bogus complaint about a 
possible circular list structure, and at least on my machine, evaluating

(let ((print-circle t))
  (print (pearl 100000)))

crashes Emacs, presumably because from C stack exhaustion.

This is of course a bug and while it would be nice to see it fixed (obviously), 
the effort required to do so may not necessarily stand in proportion to the 
severity of the bug -- but then again, reasonable people may disagree!

In particular, fixing it would entail a rather thorough reworking of the 
already quite complex reader and printer, replacing recursion with explicitly 
managed temporary heap-allocated stacks. Special care would be required not to 
make the common case more expensive, since it is so heavily used.

An alternative would be writing a special reader and printer specifically for 
marshalling Lisp data structures: they could be faster and simpler because they 
wouldn't need to recognise the full Lisp syntax, nor respect the various reader 
and printer options. The result could also be made more compact since it 
wouldn't be intended for human readers.

However, the code would still need to detect shared structure (your skip-list 
use these a lot!) and the basic Lisp syntax is fairly space-efficient, so it 
may be better to fix the stuff we already have. Maybe you want to give it a go?

> Skip list is conceptually very similar to ordinary list. Maybe the
> existing list reader functionality can be somehow adapted to read
> list-like structures?

Your skip list is nothing like an ordinary list; it is deeply recursive in the 
'car' slot, and even alternates conses and vectors for each level. Inserting 
the numbers 1..4 results in

[org-skip-list- 16 2 0.367879441171 #3=(org-skip-list--head . [(1 . [#2=(2 . 
[#1=(3 . [(4 . [nil]) nil]) #1#])]) #2# nil nil nil nil nil nil nil nil nil nil 
nil nil nil nil]) [#1# #1# #3# #3# #3# #3# #3# #3# #3# #3# #3# #3# #3# #3# #3# 
#3#] <]

where the hurting part is [(1 . [#2=(2 . [#1=(3 . [(4 . [... -- a depth of 2N 
for N elements. There is also a lot of shared structure (all the #k#) imposing 
further work on the printer and reader.

> And more generally, I suspect that M-x memory-report suffers from
> similar problem with prin1 - treating list-like structure recursively.

True! In one respect this is more severe, since someone invoking memory-report 
may already be somewhat short on memory, so a solution that uses an explicitly 
managed stack may fail from heap exhaustion, or from thrashing (which is 
usually worse). We could perhaps use the old flip-the-pointers trick to do the 
counting in O(1) space (first demonstrated by Knuth, I believe). It works on 
general graphs but can be a bit slow.

> I know that balanced trees have the same asymptotic behaviour with skip
> lists and better worst-case behaviour. However, the original Pugh's paper
> introducing skip lists did a comparison between skip list and
> self-balancing trees.

That's true (I remember being fascinated by that paper) but it was written in 
1989 and some things have changed since. In particular, memory locality matters 
more today. Random number generators have made strides but they are often 
expensive, and the one in Emacs isn't likely to be particularly fast (nor good).

Skip-lists have some properties that make them interesting for some uses, such 
as being fairly amenable to lock-free operation, but that's not of interest in 
Emacs today.

> The common operations on the cache are: (1) random element
> lookups (element at point while user edits the Org buffer); (2)
> iterating cache elements one-by-one to filter them (agenda views,
> column views, sparse trees, etc); (3) Deleting or inserting a series of
> adjacent elements upon file modification (normal user edits). (4)
> Combination of cache iteration with adding/removing elements in-place
> (agenda lookups when parts of the cache should be calculated as we find
> that some elements are not yet in cache).

(3) is the killer here since it rules out hash tables, or does it? Is 
maintaining element order a strict requirement? Otherwise just go with hash 
tables.

> The benchmarks (see code below) of skip list vs. AVL-tree give the
> following results:

First let me commend you for making actual measurements! Of course, the usual 
benchmarking caveats apply.

> - Insertion of 200k subsequent numbers (0, 1, 2, ...): 0.44 sec (often 
> triggers garbage collection)

Most data structures allow for fast insertion of consecutive elements as a 
special operation, typically in more-or-less linear time. I'm sure we could add 
it to the standard AVL implementation if it would make a difference.

Since I'm not sure what your keys are (integers, strings, ...) nor their 
distribution (dense, sparse, clustered...) it's hard to recommend precise 
structures but some alternatives are:

- red-black-trees: typically similar to AVL trees but often easier to implement
- radix trees: very fast for dense or sparse-but-clustered integers
- B-trees: often very fast because of their good locality, many variants
- tries: many variants, typically tailored for a specific key type (eg ASCII 
strings).

I'm not sure if your usage benefits from persistence but it's nice to have in 
many cases.

Then there are implementation aspects to consider. For example, suppose you 
want to store three values in a tree node (value and left and right subtrees). 
You could use:

- vector: [L R X]
- list: (L R X)
- improper list: (L R . X)

They all have their advantages and drawbacks, and you don't know which one is 
best until you measure. The result is often counter-intuitive. Also remember 
that Elisp has efficient boolean vectors, so if you need an extra bit per node 
it may be more economical to gather them all in one vector instead of wasting 
8-16 bytes for each bit. Even bignums can be harnessed for the occasional 
profit.

> In summary, skip list appears to give benefit in terms to overall speed
> (within context of grammar cache). The current implementation of skip
> list + org-element-cache is slightly faster compared to AVL tree and
> feels much snappier.

It could very well be, but it's far from a double-blind trial. Few things are 
without limit; one of them is Man's capacity for self-deception.

> The other aspect is ease of usage. AVL tree is trickier to work with
> (see below code example).

Looks like you are using it in a somewhat unorthodox way; those double hyphens 
are there for a reason. If you implement a data structure for your own use, you 
will of course make it handy for those specific purposes. It is not uncommon 
for data structures to provide cursors that help with your kind of local 
in-place editing.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]