[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Tree-sitter navigation time grows as sqrt(line-number)
From: |
Yuan Fu |
Subject: |
Re: Tree-sitter navigation time grows as sqrt(line-number) |
Date: |
Thu, 17 Aug 2023 20:00:50 -0700 |
> On Aug 16, 2023, at 9:01 PM, JD Smith <jdtsmith@gmail.com> wrote:
>
> I recently posted about the high variability of Emacs 29’s tree-sitter
> navigation performance within a file. I decided to conduct a simple test on
> a large python file of about 8400 lines to see if I could learn more. The
> test is as follows: at the start of each line, locate the current syntax
> node, and starting from it, navigate up to the root node via
> `treesit-node-parent’.
>
> I was surprised to find that the time this takes grows as sqrt(N), for line
> number N. This leads to performance variability of >100x for code that needs
> to walk the local syntax tree in large files. Such variability can make
> performance projections and optimizations for latency-sensitive uses of
> tree-sitter (e.g. via font-lock) tricky.
>
> I’m unclear whether this is fundamental to the tree-sitter parse/tree
> algorithm, or if the scaling comes from Emacs’ TS implementation. It does
> vaguely remind me of similar scaling with an old line-numbering algorithm,
> where lines were always being counted from the beginning of the buffer, so
> very fast at the front, and very slow near the end.
>
> Code and details here:
>
>
> https://gist.github.com/jdtsmith/7fa6263a13559d587abb51827e6ae472
I’m not entirely surprised. In the parse tree that tree-sitter generates is a
DAG, where the parent node has pointers to the child nodes, but not the other
way around. That means to go to the parent node from a child node, what
tree-sitter actually does is go down from the root node until it hits the
parent node. This process is linear to the height of the tree.
Also, getting the node at point isn’t free either. To get the node at point, we
actually iterates from the first child node of the root node until reaching one
that contains the point, then iterate from the first child node of that node
until reaching one that contains the point, etc, until we reach a leaf node. So
log(N) time complexity is expected.
Theses are fundamental limits of tree-sitter, unless it changes its data
structure. I’m not too worried tho, because IIRC the absolute time is very
short. The 100x variability doesn’t mean much if the 100x is still very fast.
Yuan
- Re: Tree-sitter navigation time grows as sqrt(line-number), (continued)
- Re: Tree-sitter navigation time grows as sqrt(line-number), JD Smith, 2023/08/17
- Re: Tree-sitter navigation time grows as sqrt(line-number), Dmitry Gutov, 2023/08/17
- Re: Tree-sitter navigation time grows as sqrt(line-number), Dmitry Gutov, 2023/08/17
- Re: Tree-sitter navigation time grows as sqrt(line-number), JD Smith, 2023/08/19
- Re: Tree-sitter navigation time grows as sqrt(line-number), Yuan Fu, 2023/08/19
- Re: Tree-sitter navigation time grows as sqrt(line-number), JD Smith, 2023/08/19
- Re: Tree-sitter navigation time grows as sqrt(line-number), Dmitry Gutov, 2023/08/19
- Re: Tree-sitter navigation time grows as sqrt(line-number), Yuan Fu, 2023/08/19
- Re: Tree-sitter navigation time grows as sqrt(line-number), Eli Zaretskii, 2023/08/20
Re: Tree-sitter navigation time grows as sqrt(line-number),
Yuan Fu <=
Re: Tree-sitter navigation time grows as sqrt(line-number), Dmitry Gutov, 2023/08/17