emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: treesitter local parser: huge slowdown and memory usage in a long fi


From: Vincenzo Pupillo
Subject: Re: treesitter local parser: huge slowdown and memory usage in a long file
Date: Sat, 20 Apr 2024 21:14:14 +0200

Great job!
I tried your new patch with my usual benchmark, tcpdf.php, and my php-ts-mode 
and it works very well!
Thank you very much
V.

In data sabato 20 aprile 2024 04:18:53 CEST, Yuan Fu ha scritto:
> > > On Feb 18, 2024, at 9:53 PM, Yuan Fu <casouri@gmail.com> wrote:
> > >> On Feb 17, 2024, at 7:37 PM, Dmitry Gutov <dmitry@gutov.dev> wrote:
> > >> 
> > >> On 13/02/2024 10:08, Yuan Fu wrote:
> > >>>> On 12/02/2024 06:16, Yuan Fu wrote:
> > >>>>> Thanks, the culprit is the call to treesit-update-ranges in
> > >>>>> treesit--pre-redisplay, where we don’t pass it any specific range,
> > >>>>> so it
> > >>>>> updates the range for the whole buffer. Eli, is there any way to get
> > >>>>> a
> > >>>>> rough estimate the range that redisplay is refreshing? Do you think
> > >>>>> something like this would work?
> > >>>> 
> > >>>> If we don't update the ranges outside of some interval surrounding
> > >>>> the
> > >>>> window, what does that mean for correctness?
> > >>> 
> > >>> If the place of update and the embedded code currently in view belong
> > >>> to
> > >>> the same node in the host language, then when we update ranges for the
> > >>> current window-visible range, the whole node’s range is updated. So at
> > >>> least for this node, the range is correct.
> > >>> If the place of update and the embedded code currently in view belong
> > >>> to
> > >>> different nodes in the host language, then when we update ranges for
> > >>> the
> > >>> current window-visible range, only the visible node’s range is
> > >>> updated.
> > >> 
> > >> Okay. What about positions after the visible part of the buffer? Can
> > >> their
> > >> ranges be outdated? It's probably okay when the ranges are only used
> > >> for
> > >> font-lock and syntax-ppss, but I wonder about possible other
> > >> applications
> > >> (reindenting the whole buffer, for example).
> > > 
> > > It’s the same as positions before the visible part. For reindenting the
> > > whole buffer, treesit-indent-region will update the range for the whole
> > > buffer at the very beginning.
> > > 
> > >>>> Perhaps the mode has a syntax-propertize-function which behaves
> > >>>> differently (as it should) depending on the language at point. Or
> > >>>> different ranges have different syntax tables, something like that.
> > >>>> 
> > >>>> If the ranges, after some edit (perhaps a programmatic one, performed
> > >>>> far
> > >>>> from the visible area), are kept not update somewhere around the
> > >>>> beginning
> > >>>> of the buffer, do we not risk confusing the syntax-ppss parser, for
> > >>>> example?
> > >>> 
> > >>> That can happen, yes.
> > >>> 
> > >>>> Come to think of it, take treesit-indent: it only updates the ranges
> > >>>> for
> > >>>> the current line. But the line's indentation usually depends on the
> > >>>> previous buffer positions, doesn't it?
> > >>> 
> > >>> The range passed to treesit-update-ranges act as an intercepting
> > >>> range—we
> > >>> capture nodes that intercepts with the range and use them to update
> > >>> ranges.
> > >>> If the line to be indented is in an embedded language block, the whole
> > >>> block will be captured and it’s range will be given to the embedded
> > >>> language parser.
> > >>> We haven’t have any problem so far mainly because most embedded code
> > >>> blocks
> > >>> are local, and it’s rare for some edit to take place far from the
> > >>> visible
> > >>> portion which affects ranges and user expects that edit to affect the
> > >>> current visible range.
> > >>> I don’t have any great idea for a better way to update ranges right
> > >>> now.
> > >>> Let me think about that. In the meantime, I’ll push a temporary fix so
> > >>> V’s
> > >>> original problem can be solved.
> > >> 
> > >> I was thinking (since considering the same problem in mmm-mode,
> > >> actually)
> > >> that it would make sense to either plug into
> > >> syntax-propertize-function, or
> > >> have a parallel data structure similarly tracking the outdated buffer
> > >> regions, which would only update the part of the buffer which had been
> > >> modified since last time.
> > >> 
> > >> Dealing with the "remainder" of the buffer might be trickier, but maybe
> > >> some heuristic which would help detect the "no changes" case could be
> > >> implemented.> > 
> > > Yeah, something similar to syntax-ppss or jit-lock. Or maybe it can be
> > > avoided, since the current on-demand range update has been working fine,
> > > until we added treesit--pre-redisplay for syntax-ppss.
> > 
> > This is actually a bit involved, because there could be multiple layer’s
> > of
> > parsers: the host language sets range for a local parser, and the local
> > parser can set ranges for a nested-nested parser. Eg, we might have a
> > markdown parser for parsing doc-comments, and inside the markdown there
> > could be code blocks which require another level of nested parser.
> > 
> > This use-case is a bit advanced but we definitely need to support it in
> > our
> > design. And my brain is twisted by all the dependency and range. If you
> > guys has some ideas they’ll be most welcome :-)
> 
> I believe I’ve found a good way to solve this problem. I pushed the changes
> to master.
> 
> Basically I added a function treesit-parser-changed-ranges that can directly
> return the change ranges from last reparse. This means we don’t need to use
> notifiers to get those change ranges anymore. Then in
> treesit-pre-redisplay, we reparse the primary parser and get the changed
> ranges from it.
> 
> Once we have the changed ranges, we update other non-primary parser’s
> ranges, but only within the changed ranges. Originally we were updating
> those parser’s ranges on the whole buffer, which led to the slowdown. Then
> we had to use some workaround to solve this. Now the workaround isn’t
> needed anymore.
> 
> I also remove some notifier functions and moved their work into
> treesit-pre-redisplay.
> 
> Yuan







reply via email to

[Prev in Thread] Current Thread [Next in Thread]