Re: emacs-tree-sitter and Emacs

emacs-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: emacs-tree-sitter and Emacs

From:	Stephen Leake
Subject:	Re: emacs-tree-sitter and Emacs
Date:	Thu, 02 Apr 2020 17:55:36 -0800
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/26.2 (windows-nt)
Eli Zaretskii <address@hidden> writes:

>> From: Stephen Leake <address@hidden>
>> Date: Wed, 01 Apr 2020 11:51:40 -0800
>> 
>> Eli Zaretskii <address@hidden> writes:
>> 
>> > Can you tell in more detail why you need to rely on these hooks?  They
>> > shouldn't be necessary, AFAIU.  
>> 
>> It is an optimization choice.
>> 
>> In an unmodified buffer, that is smaller than 100,000 characters
>> (default setting of wisi-partial-parse-threshold), the entire buffer is
>> parsed once; that applies faces to all the Ada identifiers that need
>> faces (standard font-lock regexp handles the reserved words). Then when
>> font-lock fontifies a region, no parsing is needed.
>
> But why do you need that initial full parse in the first place?  Is
> parsing parts of the buffer so much harder?

Because the parser must see a complete top level grammar statement. In
Ada, that's the whole file; a typical file looks like:

package Nifty is

    type Foo is ...;

    function Function_1 is ...;

end Nifty;

The parser needs to see all of the "package" declaration. Java and C++
header files are similar; a single class or namespace. In C++ and C body
files, there are lots of small declarations, and you could parse each
one of those independently, but _only_ if Emacs can find the start and
end of each, which is hard.

In addition, to properly compute indent, you need the fully nested
context. Computing faces usually doesn't need that, but it might in some
cases.

>> Indent is similar; the parse sets text properties holding the indent for
>> each line; indent-region then applies them.
>
> Indent is a different use case: it happens by user command, and thus
> has different time restrictions than redisplay.

Yes, but it is computed by the same parser, so it is relevant.

>> If the default setting of jit-lock-defer-time (ie nil) is used, then
>> font-lock runs immediately after each change, and the after-change hooks
>> are not needed. But as I have mentioned, I always run with
>> jit-lock-defer-time set to 1.0 (because parsing is not fast enough in
>> some cases), so the change hooks are needed.
>
> AFAIU, tree-sitter and similar parsers are supposed to be much faster,
> so the problem with slow parsing, and all the solutions to alleviate
> that problem, may not be necessary, if they are the only reason for
> using the hooks.

The main reason the ada-mode parser is too slow is the error correction.
tree-sitter appears to have less sophisticated error correction, which
will give worse results with code under edit. The ada-mode parser can be
speeded up by specifying parameters that cripple the error correction.

In addition, users will always create huge files (where "huge" means
"bigger than we've seen before"); there are always speed limits. The
reason ada-mode has partial parse is that Eurocontrol has huge files,
that they occasionally edit, and always parsing the whole file, even in
the absence of syntax errors, was too slow.

>> The alternative to not requiring after-change hooks is to always do a full
>> parse, for ever call of fontify-region or indent-region. That is far too
>> slow.
>
> Even for indentation, a full parse should not be needed.  You need to
> only parse the outermost enclosing function/procedure, right?  That's
> rarely the full buffer, except when the buffer is small.

As discussed above, that depends on your language; in Ada it is _always_
the full buffer. And finding the start of a function in C and C++ is hard.

>> Note that Tree-Sitter requires one full parse of the buffer to generate
>> the parse tree that is later updated incrementally; in an unmodified
>> buffer, only that one parse is needed.
>
> Tree-sitter cannot know what the full buffer holds, so nothing
> prevents us from passing it just part of the buffer.  After all,
> tree-sitter should be able to do a decent job when the part we pass to
> it actually _is_ all we have in the buffer, right?

Same issues as above.

>> > And they cannot pick up every relevant change; for example, what
>> > happens if some face used for font-lock is modified?
>> 
>> Yes, that is a flaw. Not likely to occur in everyday use
>
> Redisplay cannot rely on something being "unlikely", because it's
> expected to produce correct results in all situations.  

The flaw is not in ada-mode's use of a parser or after-change-functions;
it's a general problem with font-lock.

The face values are applied to the buffer text as text properties
containing the symbol that holds the face to be used; for example
(font-lock-face font-lock-function-name-face). If the contents of that
symbol change, then redisplay must be rerun to apply the correct values.
This does _not_ require a reparse; the parser sets the text property,
and that has not changed.

Use case: A c-mode buffer A is currently displayed in a window in a
frame, it is syntactically correct, and all displayed faces are correct.
In another frame, the user uses 'M-x set-variable' to change the value
of font-lock-function-name-face.

To update the display, something has to trigger redisplay of buffer A. I
don't think using M-x set-variable in a different frame does that.

Switching buffers in a frame does cause a redisplay (to update the menu
and mode line); If M-x set-variable is done in the same frame as buffer
A, the change in font-lock-function-name-face should show up as
expected.

A similar use case would be changing from "light mode" to "dark mode".
That could be done by changing the theme using load-theme; that should
force a redisplay (I assume it does; I have not checked). 

Other than the global face variables, ada-mode does not have any
variables that control faces. Some other modes may, for example setting
the level of highlighting to minimal or max. In that case, the font-lock
regexps change, and the function that does that presumably sets
fontified to nil in the current buffer, and should also force redisplay.
If ada-mode adds a feature like this, there will be a function to change
it (perhaps a custom variable change function) that also forces a
reparse and redisplay.

> I can understand why fontification methods that are too slow want to
> get some help from hooks, but when we design and implement novel
> fontification methods using fast parsers, we should first try doing
> that without any hooks,

Yes, premature optimization is evil. Using tree-sitter to implement
font-lock should start by always parsing the whole buffer for every call
of fontify-region. If that is fast enough, we're done. If not, we can
consider whether parsing a smaller part of the buffer is possible.

Note that the fact that tree-sitter provides incremental parse is a
strong hint that the answer will be "it's not fast enough".

>> >> By default font-lock runs after every character typed
>> >
>> > No, it only runs when redisplay kicks in.  If you type very quickly,
>> > it won't run for every character.  At least AFAIR.
>> 
>> What triggers redisplay?
>
> When Emacs is about to read input, if no input is available, it
> performs redisplay.  IOW, Emacs enters redisplay when it's about to
> become idle.
>
<snip>
>> The elisp manual section "Forcing redisplay" says "Emacs normally tries
>> to redisplay the screen whenever it waits for input." After I type the
>> first character, it is no longer waiting for input, it is processing
>> that character. I assume here "process that char code" includes running
>> after-change-functions, which is (small) elisp code. But I guess after
>> processing that char, before calling redisplay, it checks if there is
>> more input, which should be true if I type fast enough. Perhaps "process
>> that char code" is faster than the combination of my fingers and the
>> keyboard char send rate?
>
> Yes, most probably.

Ok, so in practice, it is not possible to type fast enough, and
font-lock runs after every character typed. 

> In other similar situations (e.g., in Flyspell mode) we wait for some
> non-zero idle time before actually running the code which could react
> to slow typing with annoying messages.

Since font-lock is running a parser, it detects syntax errors. I
could delay the display of the fringe mark, without delaying font-lock
itself. I'll put that on my list.

-- 
-- Stephe
[Prev in Thread]
Current Thread
[Next in Thread]
Re: emacs-tree-sitter and Emacs, (continued)
- Re: emacs-tree-sitter and Emacs, Stefan Monnier, 2020/04/01
Prev by Date: Re: Technically correct or conceptually easier?
Next by Date: Re: emacs-tree-sitter and Emacs
Previous by thread: Re: emacs-tree-sitter and Emacs
Next by thread: Re: emacs-tree-sitter and Emacs
Index(es):
- Date
- Thread