emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tree-sitter navigation time grows as sqrt(line-number)


From: Yuan Fu
Subject: Re: Tree-sitter navigation time grows as sqrt(line-number)
Date: Thu, 17 Aug 2023 22:20:55 -0700

>> to go to the parent node from a child node, what tree-sitter actually does 
>> is go down from the root node until it hits the parent node. This process is 
>> linear to the height of the tree.
> 
> Do you mean linear in the width of the tree?  I’ve color-coded the plot by 
> tree depth (i.e. how many ancestors a given node has back to root).  Or maybe 
> you are thinking of the tree lying on its side (like vundo ;)?

Going down from the root node is proportion to the height of the tree, no?

> 
>> Also, getting the node at point isn’t free either. To get the node at point, 
>> we actually iterates from the first child node of the root node until 
>> reaching one that contains the point, then iterate from the first child node 
>> of that node until reaching one that contains the point, etc, until we reach 
>> a leaf node. So log(N) time complexity is expected.
> 
> I tested node-at-point on the same file, and it is quite fast, with worst 
> case performance only 30µs (vs. 3ms for the full parent navigation), and 
> growing very slowly, between sqrt(log(N)) and log(N).  Check the gist for a 
> new figure.
> 
> Unless I am misunderstanding, for the common case of finding parent of the 
> node at point, it seems the algorithm you describe could be tweaked to work 
> well.  I.e. instead of "stop when reaching a leaf node containing point", 
> just "stop when you reach a node containing point who has the original node 
> as a child”?   This should give (hypothetical) “parent-of-node-at-point” the 
> same great speed as node-at-point.  

I should’ve been more clear. For finding the parent of a node x, we stop at the 
parent of x. We only go to the leaf node when finding the node at point, which 
always returns a leaf node.

> 
> Then “parent-of-node-at-point-until” could do something quite clever: 
> accumulate parent nodes all the way from root to the child’s direct parent 
> into a list (same low cost, modulo the node storage).  Then run the predicate 
> on that list (in reverse), returning the first match.  Could of course return 
> that full list too (“ancestors-of-node-at-point”), for other uses.  These 
> should all be quite fast compared to a full breadth and depth searching of 
> every nook and cranny in the syntax tree that node-parent seems to do (having 
> no positional information to wield).

Sure, there hasn’t been a clever version because so far no one have complained 
they can’t find the parent of a node fast enough. Also I doubt the amount of 
gain we can get with a more clever algorithm. You can try implement one and 
benchmark it. I’m curious to know the result :-)

> 
>> I’m not too worried tho, because IIRC the absolute time is very short. The 
>> 100x variability doesn’t mean much if the 100x is still very fast.
> 
> 
> This is on a brand new fast machine, and 3ms is pretty slow for things that 
> need to run dozens of times per keystroke (e.g. font-lock).

Font-lock doesn’t need to find the parent of a node. So that hasn’t been a 
problem. In use-cases where finding the parent of a node is useful, 3ms hasn’t 
been a problem AFAIK. (Well, I guess indentation could benefit from a faster 
parent-finding function.)

> 
>> These are fundamental limits of tree-sitter, unless it changes its data 
>> structure. 
> 
> That is an interesting limitation for sure.  It seems that perhaps each 
> node's start..end information can be really helpful here for winnowing the 
> tree, when you are mostly concerned about nodes relevant to and covering 
> particular positions in the buffer.

> BTW, https://tree-sitter.github.io/tree-sitter/using-parsers says:
>
> A TSNode represents a single node in the syntax tree. It tracks its start and 
> end positions in the source code, as well as its relation to other nodes like 
> its parent, siblings and children.

I’m not entirely sure what exactly do you have in mind. A node’s start and end 
position doesn’t really help us finding it. Suppose you have an array of 
numbers, and you know one of the numbers is 3. How do you find the index of the 
number 3 in the array?

While the documentation say it tracks its relation to other nodes, I think it’s 
more pedagogical than factual. Behind the scenes, tree-sitter’s own node API 
does the same thing as I described: it goes from the root node and traverses 
down.

Yuan






reply via email to

[Prev in Thread] Current Thread [Next in Thread]