bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node

From:	Yuan Fu
Subject:	bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node
Date:	Mon, 27 Feb 2023 14:37:22 -0800


> On Feb 27, 2023, at 6:29 AM, Mickey Petersen <mickey@masteringemacs.org> 
> wrote:
> 
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>>> On Feb 27, 2023, at 12:22 AM, Mickey Petersen <mickey@masteringemacs.org> 
>>> wrote:
>>> 
>>> 
>>> Yuan Fu <casouri@gmail.com> writes:
>>> 
>>>>> On Feb 26, 2023, at 1:41 AM, Mickey Petersen <mickey@masteringemacs.org> 
>>>>> wrote:
>>>>> 
>>>>> 
>>>>> Yuan Fu <casouri@gmail.com> writes:
>>>>> 
>>>>>>> GC has historically never called xmalloc, so the profiler will
>>>>>>> likely
>>>>>>> crash upon growing the mark stack as well.  I guess another
>>>>>>> important
>>>>>>> question is why ts_delete_parser is calling xmalloc.
>>>>>>> 
>>>>>> 
>>>>>>> As you see, when we call ts_tree_delete, it calls
>>>>>>> ts_subtree_release,
>>>>>>> which in turn calls malloc (redirected into our xmalloc).  Is this
>>>>>>> expected?  Can you look in the tree-sitter sources and verify that
>>>>>>> this is OK?
>>>>>> 
>>>>>> I had a look, and it seems legit. In tree-sitter, a TSTree (or more
>>>>>> precisely, a Subtree) is just some inlined data plus a refcounted
>>>>>> pointer to the complete data. This way multiple trees share common
>>>>>> subtrees/nodes. Eg, when incrementally parsing, you pass in an old
>>>>>> tree and get a new tree, these two trees will share the unchanged part
>>>>>> of the tree.
>>>>> 
>>>>> Would that mean we could possibly preserve node instances -- either
>>>>> the real TS ones, or an Emacs-created facsimile -- between
>>>>> incremental parsing? That would be useful for refactoring.
>>>> 
>>>> What kind of exact interface (function) do you want? The
>>>> treesit-node-outdated error is solely Emacs’s product, tree-sitter
>>>> itself doesn’t mark a node outdated. It is possible for Emacs to not
>>>> delete the old tree and give it to you, or allow you to access
>>>> information of an outdated node.
>>> 
>>> OK, so let me explain:
>>> 
>>> Touching the buffer for any reason invalidates the whole tree; that's
>>> not good. It's not good, because a lot of the information may still be
>>> useful and viable. Outdating the node is not a bad idea as it avoids a
>>> lot of 'traps' around accidental modifications that can corrupt things
>>> without the developer's knowledge.
>>> 
>>> I'd like to be able to access all the information possible; perhaps
>>> behind a flag variable like `treesit-allow-outdated-node-access'. What
>>> I'm really mostly interested in is:
>>> 
>>> - How well the node references handle changes in byte positions in TS.
>> 
>> They don’t handle position changes. If the buffer content changed, we
>> need to reparse. Once we reparsed the buffer, a new tree is
>> born. While it is true that the new tree shares some node with the old
>> tree, tree-sitter does not expose any function or information that
>> tells you which node in the new tree is “the same” as which node in
>> the old tree; nor does it tell you whether a node in the old tree
>> still “exists” in the new tree.
>> 
>> Now, there does exist a function (in tree-sitter’s API) that allows
>> you to “edit” a node with position changes. But a) I’m not sure how
>> does it handle the case where the node is deleted by the change and b)
>> it is not very useful because once you reparse the buffer, the new
>> tree is completely independent from the old tree (ignoring the
>> implementation detail which is not exposed).
>> 
>>> 
>>> - Does changing something at X shift (like a `point-marker`) everything
>>> below it? Does an outdated node correctly reference its new location
>>> and state, such as changes to children or its position in the tree?
>> 
>> Like I said above, any buffer change will create a new tree with no relation 
>> to the old tree, so there is no shifting.
>> 
>> And there really isn’t a “new location”: we don’t know if the old node
>> is still in the new tree. Mind you, even if the node is completely
>> outside of the changed region, it can still disappear from the new
>> tree because of change of its surrounding context. For example, in the
>> following C code:
>> 
>> /*
>> int c = 1;
>> 
>> If I insert a closing comment delimiter, and buffer becomes
>> 
>> /*
>> int c = 1;
>> */
>> 
>> Even though int c = 1; is not in the changed range, nor did it’s
>> position move, all those nodes (int, c, =, etc) are not in the new
>> tree anymore, because the whole thing becomes a comment.
>> 
>> I made any access to outdated nodes error because there really isn’t
>> any good reason to use them, at least I didn’t think of any at the
>> time. And make them error out should help people catch errors.
>> 
>>> 
>>> Right now, Combobulate can make a proxy node, which essentially
>>> captures the basics of a live node and stores it in a defstruct. That
>>> way I can at least retain the start/end, type, text, etc. of a node
>>> and still do light refactoring without contorting myself to do things
>>> in a particular order, which is not always possible (like delaying
>>> editing to the very end.)
>> 
>> IIUC, you want to do some very minor whitespace edit to the buffer
>> which doesn’t really change the parse tree, so you don’t want the
>> nodes to be invalidated for no good reason? Not erroring on outdated
>> nodes is easy. As you said, we can add a
>> treesit-inhibit-error-outdated variable. But not it’s not so easy to
>> automatically update outdated nodes’ positions (with aforementioned
>> tree-sitter function). However, if you are making those changes, you
>> much know how to adjust your nodes position, right? So maybe it isn’t
>> a must-have for your purpose.
> 
> It's a good point, but it's also easy to create a scenario where you
> at least want to keep the position and esp. the type and text (for
> reporting information to the user, or similar.)

I should be clearer. I meant that treesit-inhibit-error-outdated is reasonable 
and easy to implement. So if you want we can add it. OTOH auto-updating 
outdated nodes with position information is nontrivial, and might not be 
must-have for your purpose.

> 
> My main interest is now refactoring and how to best do it. If TS can
> do some of it, then all the better. I realise it was never meant to,
> but if we can continue accessing the information contained in a node
> even if it is outdated, then that could be useful, however niche.

I guess “refactoring” includes not only whitespace changes but also some 
structural changes like slurping (or whatever it’s called), right? If you want 
to do structural changes, tree-sitter probably can’t help you much, as you 
observed. Maybe it’s better to “export” the tree-sitter tree to your own tree 
and do transformations with it? Maybe that’s already what you does now.

> Currently I use overlays and point markers, but they are not
> infallible.

Yuan

[Prev in Thread]

Current Thread

[Next in Thread]

bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node, (continued)
- bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node, Yuan Fu, 2023/02/24
  - bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node, Eli Zaretskii, 2023/02/25
    - bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node, Yuan Fu, 2023/02/25

Prev by Date: bug#61850: 29.0.60; Emacs 29 removes the ability to suppress ediff frames
Next by Date: bug#61817: 30.0.50; Project.el finds incorrect project roots in git worktrees
Previous by thread: bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node
Next by thread: bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node
Index(es):
- Date
- Thread