emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tree-sitter navigation time grows as sqrt(line-number)


From: Yuan Fu
Subject: Re: Tree-sitter navigation time grows as sqrt(line-number)
Date: Thu, 17 Aug 2023 20:00:50 -0700


> On Aug 16, 2023, at 9:01 PM, JD Smith <jdtsmith@gmail.com> wrote:
> 
> I recently posted about the high variability of Emacs 29’s tree-sitter 
> navigation performance within a file.  I decided to conduct a simple test on 
> a large python file of about 8400 lines to see if I could learn more.  The 
> test is as follows: at the start of each line, locate the current syntax 
> node, and starting from it, navigate up to the root node via 
> `treesit-node-parent’.  
> 
> I was surprised to find that the time this takes grows as sqrt(N), for line 
> number N.  This leads to performance variability of >100x for code that needs 
> to walk the local syntax tree in large files.  Such variability can make 
> performance projections and optimizations for latency-sensitive uses of 
> tree-sitter (e.g. via font-lock) tricky.  
> 
> I’m unclear whether this is fundamental to the tree-sitter parse/tree 
> algorithm, or if the scaling comes from Emacs’ TS implementation.  It does 
> vaguely remind me of similar scaling with an old line-numbering algorithm, 
> where lines were always being counted from the beginning of the buffer, so 
> very fast at the front, and very slow near the end. 
> 
> Code and details here:
> 
>   
> https://gist.github.com/jdtsmith/7fa6263a13559d587abb51827e6ae472

I’m not entirely surprised. In the parse tree that tree-sitter generates is a 
DAG, where the parent node has pointers to the child nodes, but not the other 
way around. That means to go to the parent node from a child node, what 
tree-sitter actually does is go down from the root node until it hits the 
parent node. This process is linear to the height of the tree.

Also, getting the node at point isn’t free either. To get the node at point, we 
actually iterates from the first child node of the root node until reaching one 
that contains the point, then iterate from the first child node of that node 
until reaching one that contains the point, etc, until we reach a leaf node. So 
log(N) time complexity is expected.

Theses are fundamental limits of tree-sitter, unless it changes its data 
structure. I’m not too worried tho, because IIRC the absolute time is very 
short. The 100x variability doesn’t mean much if the 100x is still very fast.

Yuan


reply via email to

[Prev in Thread] Current Thread [Next in Thread]