emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [SPAM UNSURE] Re: Reliable after-change-functions (via: Using increm


From: Stephen Leake
Subject: Re: [SPAM UNSURE] Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
Date: Fri, 03 Apr 2020 10:11:05 -0800
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (windows-nt)

Eli Zaretskii <address@hidden> writes:

>> From: Stephen Leake <address@hidden>
>> Date: Thu, 02 Apr 2020 18:49:07 -0800
>>
>> > I think we should try to avoid both copying and encoding the text we
>> > send to the parser.  Both operations are expensive and require memory
>> > allocation.
>>
>> I don't understand what the alternative is. The parser imposes the
>> reasonable requirement that the input text be utf-8 (or possibly some
>> other standard format). Emacs raw buffer text is not utf-8, so we must
>> do some encoding.
>
> Emacs represents buffer text as a superset of UTF-8, with the
> violations of strict UTF-8 being very rare in buffers that hold
> program sources.  The function we can provide that lets tree-sitter
> access buffer text can cope with those violations,

Ok. "cope with those violations" = "do some encoding".

We can avoid copying _if_ the encoding does not change character
positions, or somehow preserves positions, for example with an auxiliary
table of changes due to encoding.

Coping with violations in the lexer would make it much easier to avoid
changing character positions; it is easy to simply ignore bytes there.

wisi makes it easy to implement this in the lexer (because it uses
re2c), although currently there is no way to make that language-specific
(that would be an enhancement).

https://tree-sitter.github.io/tree-sitter/creating-parsers#external-scanners
describes the facility for enhancing the tree-sitter lexer (aka
scanner). That is not convenient for handling this issue, so we'd have
to request (and or provide) an enhancement.

We cannot avoid encoding (either in the read function provided to
tree-sitter, or in the tree-sitter lexer), but the encoding may be very
simple and efficient.

--
-- Stephe



reply via email to

[Prev in Thread] Current Thread [Next in Thread]