emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tree-sitter maturity


From: Philip Kaludercic
Subject: Re: Tree-sitter maturity
Date: Fri, 27 Dec 2024 15:02:51 +0000

Philip Kaludercic <philipk@posteo.net> writes:

> Daniel Colascione <dancol@dancol.org> writes:
>
>> On December 27, 2024 9:19:12 AM EST, Philip Kaludercic <philipk@posteo.net> 
>> wrote:
>>>Daniel Colascione <dancol@dancol.org> writes:
>>>
>>>> On December 27, 2024 7:40:19 AM EST, Eli Zaretskii <eliz@gnu.org> wrote:
>>>>>> From: Philip Kaludercic <philipk@posteo.net>
>>>>>> Cc: Xiyue Deng <manphiz@gmail.com>,  emacs-devel@gnu.org
>>>>>> Date: Fri, 27 Dec 2024 10:54:29 +0000
>>>>>> 
>>>>>> Richard Stallman <rms@gnu.org> writes:
>>>>>> 
>>>>>> > If we add something like this to Emacs, there is an issue we need to
>>>>>> > take care about: to make carefully sure that it does not install
>>>>>> > any nonfree grammars.  I don't know how those grammars are released,
>>>>>> > ir by whom, or how much they care about free software.  We can't
>>>>>> > take for granted that they do.
>>>>>> >
>>>>>> > Perhaps we could check automatically that the grammar found is properly
>>>>>> > licenses, and disregard any grammars that are not free.
>>>>>> >
>>>>>> > By contrast, if grammars are going to be packaged and released for
>>>>>> > distros, and chosen for installation by users, then it is the user's
>>>>>> > responsibility, not Emacs's responsibility, to reject the nonfree ones
>>>>>> > (and the GNU/Linux distro might insist on that).
>>>>>> 
>>>>>> It might take a while for that to happen, which is why I still believe
>>>>>> it would be better if tree-sitter major modes would populate
>>>>>> `treesit-language-source-alist' on their own, and point to the specific
>>>>>> checkouts that the major mode developer tested their implementation
>>>>>> against.
>>>>>
>>>>>We could have done that, but there's no way we could keep the value of
>>>>>treesit-language-source-alist up-to-date, because the grammar
>>>>>libraries put out new versions much more frequently than Emacs
>>>>>releases, especially if you consider libraries that have no official
>>>>>versions at all (in which case we can only point to some revision in
>>>>>their repository).
>>>>>
>>>>>The question that bothers me is how useful is it to have
>>>>>treesit-language-source-alist that is outdated?  What do we expect the
>>>>>users to do with such an outdated value?
>>>>>
>>>>
>>>> Why not just vendor all the grammars with the Emacs modes that use them?
>>>
>>>I am guessing part of the reason is that TS grammars are not fun to
>>>build.  IIRC they are specified in a Javascript DSL (that used to
>>>require node.js but AFAIU works with other implementations as well),
>>>that a program written in Rust translates to C code.  So do we vendor
>>>the DSL and depend on the TreeSitter toolchain or do we vender the
>>>generated code?
>>
>> It's a shame there's no way to write TS grammars in plain elisp. I
>> figure vendoring both the source and the generated code would be best,
>> as it'd allow building Emacs anywhere but still make it convenient on
>> systems with needed tools (JS runtime, Rust, etc.) to update and
>> modify the grammar. As with any scheme involving checking in generated
>> outputs, the source and output can get out of sync, but I think there
>> are build time guardrails we can build to make sure it doesn't happen.
>
> Writing the grammar in Elisp would require both a new toolchain and the
> effort of rewriting all the existing grammars in Elisp.  My
> understanding of the benefit that TS intends to provide, is that the
> manpower invested into writing grammars that deal with all the
> edge-cases which traditional regexp/heuristic parsing had difficulties
> with.
>
> There is also the general point of helping to realise software freedom,
> where a -ts-mode makes it much more difficult (though of course not
> impossible) to adjust a grammar.  Wasn't there some complication when
> trying to reload a grammar?  The additional dependencies and the
> indirect effect of changes compared with Elisp is something we should be
> concerned about when trying to maintain "the spirit of Emacs" (which of
> course means different things to different people).
>
> Vendoring might help to reproduce builds if that turns out to be a big
> issue, but I am not a fan of the additional hurdles in making use of the
> source code.  Does anyone know of alternative, less invested
> build-chain the re-uses the libtree-sitter.so library.

Oh, and another point I have been reminded of while writing this: The
recent addition of more and more -ts-modes without "regular" -modes has
been slightly concerning.  While I understand that re-implementing a
"lua-mode" or "php-mode" from scratch is not an effort one wants to
impose on anymore, simpler files such as dockerfile-mode or go-mod-mode
/without/ Tree Sitter would be a nice thing to have for people on
systems without Tree Sitter, or without the ability to download and
build code from GitHub (e.g. missing internet access, without Git/GCC,
without the necessary development libraries).  Even if the experience
were degraded, just re-using the keywords to provide some basic
highlighting would be a nice fallback.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]