[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#52753: 29.0.50; Printing long list-like structures fails
From: |
Mattias Engdegård |
Subject: |
bug#52753: 29.0.50; Printing long list-like structures fails |
Date: |
Fri, 24 Dec 2021 17:54:45 +0100 |
24 dec. 2021 kl. 11:56 skrev Ihor Radchenko <yantar92@gmail.com>:
> I understand that the current implementations of the read/prin1 are
> recursive. Yet, they are able to read ordinary lists in non-recursive
> way.
Serves me right for not being precise enough! The Emacs lisp printer and reader
are recursive in general, but no recursion is required for the standard list
structure with links in the 'cdr' of each element. They do use recursion for
reading and printing structures in the 'car' field, however. Consider:
(defun pearl (n)
(let ((x 'grain-of-sand))
(dotimes (_ n)
(setq x (list x)))
x))
If you try to print (pearl 10000) you probably get a bogus complaint about a
possible circular list structure, and at least on my machine, evaluating
(let ((print-circle t))
(print (pearl 100000)))
crashes Emacs, presumably because from C stack exhaustion.
This is of course a bug and while it would be nice to see it fixed (obviously),
the effort required to do so may not necessarily stand in proportion to the
severity of the bug -- but then again, reasonable people may disagree!
In particular, fixing it would entail a rather thorough reworking of the
already quite complex reader and printer, replacing recursion with explicitly
managed temporary heap-allocated stacks. Special care would be required not to
make the common case more expensive, since it is so heavily used.
An alternative would be writing a special reader and printer specifically for
marshalling Lisp data structures: they could be faster and simpler because they
wouldn't need to recognise the full Lisp syntax, nor respect the various reader
and printer options. The result could also be made more compact since it
wouldn't be intended for human readers.
However, the code would still need to detect shared structure (your skip-list
use these a lot!) and the basic Lisp syntax is fairly space-efficient, so it
may be better to fix the stuff we already have. Maybe you want to give it a go?
> Skip list is conceptually very similar to ordinary list. Maybe the
> existing list reader functionality can be somehow adapted to read
> list-like structures?
Your skip list is nothing like an ordinary list; it is deeply recursive in the
'car' slot, and even alternates conses and vectors for each level. Inserting
the numbers 1..4 results in
[org-skip-list- 16 2 0.367879441171 #3=(org-skip-list--head . [(1 . [#2=(2 .
[#1=(3 . [(4 . [nil]) nil]) #1#])]) #2# nil nil nil nil nil nil nil nil nil nil
nil nil nil nil]) [#1# #1# #3# #3# #3# #3# #3# #3# #3# #3# #3# #3# #3# #3# #3#
#3#] <]
where the hurting part is [(1 . [#2=(2 . [#1=(3 . [(4 . [... -- a depth of 2N
for N elements. There is also a lot of shared structure (all the #k#) imposing
further work on the printer and reader.
> And more generally, I suspect that M-x memory-report suffers from
> similar problem with prin1 - treating list-like structure recursively.
True! In one respect this is more severe, since someone invoking memory-report
may already be somewhat short on memory, so a solution that uses an explicitly
managed stack may fail from heap exhaustion, or from thrashing (which is
usually worse). We could perhaps use the old flip-the-pointers trick to do the
counting in O(1) space (first demonstrated by Knuth, I believe). It works on
general graphs but can be a bit slow.
> I know that balanced trees have the same asymptotic behaviour with skip
> lists and better worst-case behaviour. However, the original Pugh's paper
> introducing skip lists did a comparison between skip list and
> self-balancing trees.
That's true (I remember being fascinated by that paper) but it was written in
1989 and some things have changed since. In particular, memory locality matters
more today. Random number generators have made strides but they are often
expensive, and the one in Emacs isn't likely to be particularly fast (nor good).
Skip-lists have some properties that make them interesting for some uses, such
as being fairly amenable to lock-free operation, but that's not of interest in
Emacs today.
> The common operations on the cache are: (1) random element
> lookups (element at point while user edits the Org buffer); (2)
> iterating cache elements one-by-one to filter them (agenda views,
> column views, sparse trees, etc); (3) Deleting or inserting a series of
> adjacent elements upon file modification (normal user edits). (4)
> Combination of cache iteration with adding/removing elements in-place
> (agenda lookups when parts of the cache should be calculated as we find
> that some elements are not yet in cache).
(3) is the killer here since it rules out hash tables, or does it? Is
maintaining element order a strict requirement? Otherwise just go with hash
tables.
> The benchmarks (see code below) of skip list vs. AVL-tree give the
> following results:
First let me commend you for making actual measurements! Of course, the usual
benchmarking caveats apply.
> - Insertion of 200k subsequent numbers (0, 1, 2, ...): 0.44 sec (often
> triggers garbage collection)
Most data structures allow for fast insertion of consecutive elements as a
special operation, typically in more-or-less linear time. I'm sure we could add
it to the standard AVL implementation if it would make a difference.
Since I'm not sure what your keys are (integers, strings, ...) nor their
distribution (dense, sparse, clustered...) it's hard to recommend precise
structures but some alternatives are:
- red-black-trees: typically similar to AVL trees but often easier to implement
- radix trees: very fast for dense or sparse-but-clustered integers
- B-trees: often very fast because of their good locality, many variants
- tries: many variants, typically tailored for a specific key type (eg ASCII
strings).
I'm not sure if your usage benefits from persistence but it's nice to have in
many cases.
Then there are implementation aspects to consider. For example, suppose you
want to store three values in a tree node (value and left and right subtrees).
You could use:
- vector: [L R X]
- list: (L R X)
- improper list: (L R . X)
They all have their advantages and drawbacks, and you don't know which one is
best until you measure. The result is often counter-intuitive. Also remember
that Elisp has efficient boolean vectors, so if you need an extra bit per node
it may be more economical to gather them all in one vector instead of wasting
8-16 bytes for each bit. Even bignums can be harnessed for the occasional
profit.
> In summary, skip list appears to give benefit in terms to overall speed
> (within context of grammar cache). The current implementation of skip
> list + org-element-cache is slightly faster compared to AVL tree and
> feels much snappier.
It could very well be, but it's far from a double-blind trial. Few things are
without limit; one of them is Man's capacity for self-deception.
> The other aspect is ease of usage. AVL tree is trickier to work with
> (see below code example).
Looks like you are using it in a somewhat unorthodox way; those double hyphens
are there for a reason. If you implement a data structure for your own use, you
will of course make it handy for those specific purposes. It is not uncommon
for data structures to provide cursors that help with your kind of local
in-place editing.