emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reliable after-change-functions (via: Using incremental parsing in E


From: Stephen Leake
Subject: Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
Date: Thu, 02 Apr 2020 18:27:59 -0800
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (windows-nt)

Eli Zaretskii <address@hidden> writes:

>> From: Stephen Leake <address@hidden>
>> Date: Wed, 01 Apr 2020 15:38:26 -0800
>> 
>> Eli Zaretskii <address@hidden> writes:
>> 
>> > Also, direct access to buffer text generally means we must make sure
>> > GC never runs as long as pointers to buffer text are lying around.
>> > Can any Lisp run between calls to the reader function that the
>> > tree-sitter parser calls to access the buffer text?  
>> 
>> If the parser copies the text into an internal buffer, that reader
>> function should only be called once per call to the parser.
>
> Such copying is not really scalable, and IMO should be avoided.
> During active editing, redisplay runs very frequently, and having to
> copy portions of the buffer, let alone all of it, each time, which
> necessarily requires memory allocation, consing of Lisp objects, etc.,
> will produce significant memory pressure, expensive heap
> allocations/deallocations, and a lot of GC.  Recall that on many
> modern platforms Emacs doesn't really return memory to the system,
> which means we risk increasing the memory footprint, and create
> system-wide memory pressure.  It isn't a catastrophe, but we should
> try to avoid it if possible.

Ok. I know very little about the internal storage of text in Emacs.
There is at least two strings with a gap at the current edit point; if
we pass a simple pointer to tree-sitter, it will have to handle the gap.
You mention "consing of Lisp objects" above, which says to me that the
text is stored in a more complex structure. How can we provide direct
access of that to tree-sitter?

Avoid _all_ copying is impossible; the parser must store the contents of
each token in some way. Typically that is done by storing
pointers/indices into the text buffer that contains the entire text.

>> In sum, the short answer is "yes, you must parse the whole file, unless
>> your language is particularly simple".
>
> Funny, my conclusion from reading your detailed description was
> entirely different.

I need more than that to respond in a helpful way.

>> In general, each parser library, and even each grammar author, will have
>> different representations for the syntax tree.
>> 
>> So if we want to support different parsers, I think it is best to define
>> the Emacs "parser API" as "give text to parser; accept text properties
>> from parser".
>
> Yes, something like that.  It's probably enough to accept a list of
> regions with syntactic attributes.

Ok, good.

-- 
-- Stephe



reply via email to

[Prev in Thread] Current Thread [Next in Thread]