emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: cc-mode fontification feels random


From: Ergus
Subject: Re: cc-mode fontification feels random
Date: Sat, 12 Jun 2021 18:59:52 +0200

On Sat, Jun 12, 2021 at 05:56:34PM +0200, Theodor Thornhill wrote:
Stefan Monnier <monnier@iro.umontreal.ca> writes:

@Stefan - I'm not sure I understand what you mean by troublesome for
elisp hackers.  These grammars have a lisp-like dsl, and is pretty
usable through C-M-x and defvars, see:
https://github.com/emacs-csharp/csharp-mode/blob/master/csharp-tree-sitter.el#L44.

AFAIK the grammar itself is still written in Javascript.


Yeah, but compiled parsers can be supplied through CI or something like that.


[...]

Agreed.  Maybe a first step would be to get copyright assignments and
include the tree sitter module in GNU ELPA?


If I read some of these mails correctly it seems like that wouldn't be
possible due to interest from some of the parties involved in the main
package.  I don't know the details on that, though.  And Eli seems
unhappy with what's there.

As for making a little more concrete proposal for how to move forward,
would this be something like what we want?

- create/use c or rust bindings

Hi:
Eli and the others will give better info for sure, but just to start
(and also they may correct my ideas):

First there is needed a "mode-local" initialization for the parser based
on the major mode (as explained in the TS doc). The parser probably must
be stored somewhere in the "mode" to avoid parser duplication for the
same language. This should be executed probably once/mode (it may be
perfectly in the lisp side then) and will be a wrapper to call:

ts_parser_new
ts_parser_set_language

After that in the C side I think that all we need is in buffer.{h,c}.

to pass the current_buffer->text->beg (or similar) directly to
ts_parser_parse_string or ts_parser_parse_string_encoding.
Here we must exclude the gap region maybe with ts_parser_included_ranges
(all that information seems to be there as macros in buffer.h).

Once we have a tree we associate it with the buffer it belongs to. And
then comes the rest.

- create an elisp-layer for interaction with the parse tree

Basically we need to expose some of them, but it is better if we can
handle the most we can in the C side. Using simpler data types and
handling entire regions with the ts_tree_cursor_* functionalities. Must
of course, some of the will be needed for other functionalities.
I don't know if we can manage the font-locking from C? But I think that
text properties can.

So the next step is just traverse the visible region of the tree to
convert the info in text properties.

Here will be needed a sort of translation between
ts_language_symbol_count and font-lock faces.

- hook fontification and indentation into that elisp-layer


If I understood what Eli wants to prevent, if we set the properties and
faces in step 2; then these hooks may not be needed.

In most cases we will need to call ts_parser_parse_string somewhere
`after-change-functions` (or maybe earlier I don't know) passing it the
old tree and getting the differences with the new one with
ts_tree_get_changed_ranges.

This returns something much smaller than the tree so maybe we can
convert it into a lisp list to use it in font-lock in the lisp side if
we can't handle most of it in C.

It feels like the elisp-layer will be the easiest part.  I'm not really
well versed in where to look in the c code of emacs for where and how to
link this, so some pointers would be nice.

It looks like most people agree that tree sitter support is wanted, so
maybe it's time to start doing it?  I can surely have a stab at it, but
I'd like some guidance for how to proceed best - if it's wanted, that
is.

--
Theodor



reply via email to

[Prev in Thread] Current Thread [Next in Thread]