[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node
From: |
Yuan Fu |
Subject: |
bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node |
Date: |
Mon, 27 Feb 2023 14:37:22 -0800 |
> On Feb 27, 2023, at 6:29 AM, Mickey Petersen <mickey@masteringemacs.org>
> wrote:
>
>
> Yuan Fu <casouri@gmail.com> writes:
>
>>> On Feb 27, 2023, at 12:22 AM, Mickey Petersen <mickey@masteringemacs.org>
>>> wrote:
>>>
>>>
>>> Yuan Fu <casouri@gmail.com> writes:
>>>
>>>>> On Feb 26, 2023, at 1:41 AM, Mickey Petersen <mickey@masteringemacs.org>
>>>>> wrote:
>>>>>
>>>>>
>>>>> Yuan Fu <casouri@gmail.com> writes:
>>>>>
>>>>>>> GC has historically never called xmalloc, so the profiler will
>>>>>>> likely
>>>>>>> crash upon growing the mark stack as well. I guess another
>>>>>>> important
>>>>>>> question is why ts_delete_parser is calling xmalloc.
>>>>>>>
>>>>>>
>>>>>>> As you see, when we call ts_tree_delete, it calls
>>>>>>> ts_subtree_release,
>>>>>>> which in turn calls malloc (redirected into our xmalloc). Is this
>>>>>>> expected? Can you look in the tree-sitter sources and verify that
>>>>>>> this is OK?
>>>>>>
>>>>>> I had a look, and it seems legit. In tree-sitter, a TSTree (or more
>>>>>> precisely, a Subtree) is just some inlined data plus a refcounted
>>>>>> pointer to the complete data. This way multiple trees share common
>>>>>> subtrees/nodes. Eg, when incrementally parsing, you pass in an old
>>>>>> tree and get a new tree, these two trees will share the unchanged part
>>>>>> of the tree.
>>>>>
>>>>> Would that mean we could possibly preserve node instances -- either
>>>>> the real TS ones, or an Emacs-created facsimile -- between
>>>>> incremental parsing? That would be useful for refactoring.
>>>>
>>>> What kind of exact interface (function) do you want? The
>>>> treesit-node-outdated error is solely Emacs’s product, tree-sitter
>>>> itself doesn’t mark a node outdated. It is possible for Emacs to not
>>>> delete the old tree and give it to you, or allow you to access
>>>> information of an outdated node.
>>>
>>> OK, so let me explain:
>>>
>>> Touching the buffer for any reason invalidates the whole tree; that's
>>> not good. It's not good, because a lot of the information may still be
>>> useful and viable. Outdating the node is not a bad idea as it avoids a
>>> lot of 'traps' around accidental modifications that can corrupt things
>>> without the developer's knowledge.
>>>
>>> I'd like to be able to access all the information possible; perhaps
>>> behind a flag variable like `treesit-allow-outdated-node-access'. What
>>> I'm really mostly interested in is:
>>>
>>> - How well the node references handle changes in byte positions in TS.
>>
>> They don’t handle position changes. If the buffer content changed, we
>> need to reparse. Once we reparsed the buffer, a new tree is
>> born. While it is true that the new tree shares some node with the old
>> tree, tree-sitter does not expose any function or information that
>> tells you which node in the new tree is “the same” as which node in
>> the old tree; nor does it tell you whether a node in the old tree
>> still “exists” in the new tree.
>>
>> Now, there does exist a function (in tree-sitter’s API) that allows
>> you to “edit” a node with position changes. But a) I’m not sure how
>> does it handle the case where the node is deleted by the change and b)
>> it is not very useful because once you reparse the buffer, the new
>> tree is completely independent from the old tree (ignoring the
>> implementation detail which is not exposed).
>>
>>>
>>> - Does changing something at X shift (like a `point-marker`) everything
>>> below it? Does an outdated node correctly reference its new location
>>> and state, such as changes to children or its position in the tree?
>>
>> Like I said above, any buffer change will create a new tree with no relation
>> to the old tree, so there is no shifting.
>>
>> And there really isn’t a “new location”: we don’t know if the old node
>> is still in the new tree. Mind you, even if the node is completely
>> outside of the changed region, it can still disappear from the new
>> tree because of change of its surrounding context. For example, in the
>> following C code:
>>
>> /*
>> int c = 1;
>>
>> If I insert a closing comment delimiter, and buffer becomes
>>
>> /*
>> int c = 1;
>> */
>>
>> Even though int c = 1; is not in the changed range, nor did it’s
>> position move, all those nodes (int, c, =, etc) are not in the new
>> tree anymore, because the whole thing becomes a comment.
>>
>> I made any access to outdated nodes error because there really isn’t
>> any good reason to use them, at least I didn’t think of any at the
>> time. And make them error out should help people catch errors.
>>
>>>
>>> Right now, Combobulate can make a proxy node, which essentially
>>> captures the basics of a live node and stores it in a defstruct. That
>>> way I can at least retain the start/end, type, text, etc. of a node
>>> and still do light refactoring without contorting myself to do things
>>> in a particular order, which is not always possible (like delaying
>>> editing to the very end.)
>>
>> IIUC, you want to do some very minor whitespace edit to the buffer
>> which doesn’t really change the parse tree, so you don’t want the
>> nodes to be invalidated for no good reason? Not erroring on outdated
>> nodes is easy. As you said, we can add a
>> treesit-inhibit-error-outdated variable. But not it’s not so easy to
>> automatically update outdated nodes’ positions (with aforementioned
>> tree-sitter function). However, if you are making those changes, you
>> much know how to adjust your nodes position, right? So maybe it isn’t
>> a must-have for your purpose.
>
> It's a good point, but it's also easy to create a scenario where you
> at least want to keep the position and esp. the type and text (for
> reporting information to the user, or similar.)
I should be clearer. I meant that treesit-inhibit-error-outdated is reasonable
and easy to implement. So if you want we can add it. OTOH auto-updating
outdated nodes with position information is nontrivial, and might not be
must-have for your purpose.
>
> My main interest is now refactoring and how to best do it. If TS can
> do some of it, then all the better. I realise it was never meant to,
> but if we can continue accessing the information contained in a node
> even if it is outdated, then that could be useful, however niche.
I guess “refactoring” includes not only whitespace changes but also some
structural changes like slurping (or whatever it’s called), right? If you want
to do structural changes, tree-sitter probably can’t help you much, as you
observed. Maybe it’s better to “export” the tree-sitter tree to your own tree
and do transformations with it? Maybe that’s already what you does now.
> Currently I use overlays and point markers, but they are not
> infallible.
Yuan
- bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node, (continued)
- bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node, Mickey Petersen, 2023/02/26
- bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node, Yuan Fu, 2023/02/26
- bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node, Mickey Petersen, 2023/02/27
- bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node, Yuan Fu, 2023/02/27
- bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node, Mickey Petersen, 2023/02/27
- bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node,
Yuan Fu <=
- bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node, Dmitry Gutov, 2023/02/27
bug#60237: 30.0.50; tree sitter core dumps when I edebug view a node, Yuan Fu, 2023/02/24