emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tree-sitter maturity


From: Lynn Winebarger
Subject: Re: Tree-sitter maturity
Date: Sun, 29 Dec 2024 22:20:59 -0500



On Sun, Dec 29, 2024, 7:31 PM Yuan Fu <casouri@gmail.com> wrote:


> On Dec 29, 2024, at 3:29 PM, Björn Bidar <bjorn.bidar@thaodan.de> wrote:
>
> Daniel Colascione <dancol@dancol.org> writes:
>
>> Lynn Winebarger <owinebar@gmail.com> writes:
>>
>>> On Fri, Dec 27, 2024, 9:25 AM Daniel Colascione <dancol@dancol.org> wrote:
>>>
>>>>
>>>>
>>>> It's a shame there's no way to write TS grammars in plain elisp. I figure
>>>> vendoring both the source and the generated code would be best, as it'd
>>>> allow building Emacs anywhere but still make it convenient on systems with
>>>> needed tools (JS runtime, Rust, etc.) to update and modify the grammar. As
>>>> with any scheme involving checking in generated outputs, the source and
>>>> output can get out of sync, but I think there are build time guardrails we
>>>> can build to make sure it doesn't happen.
>>>>
>>>
>>> I looked into this last year.  The tree-sitter library provides a parsing
>>> engine that references a fairly standard LR type parsing table in binary
>>> form.  I got stuck in adding a generic primitive functionality for reading
>>> and writing arbitrary binary data structures based on a data description
>>> DSL, since I wouldn't want to tie the interpreter core to the data
>>> structures of an external, dynamically-loadable library.  But, I wasn't
>>> sure such an extension would be accepted into emacs, as I am not an expert
>>> on the possible security implications.
>>>
>>> Other than that, emacs already has the code for calculating (LA)LR parsing
>>> tables in the semantic packages.  The tree-sitter grammar compiler may have
>>> additional logic for providing multiple starting symbols, but the parsing
>>> engine should still function with a classic parsing table.
>>
>> Thanks.  Such an approach would let us treat tree-sitter grammars a lot
>> more like font-lock-keywords, and I think for some modes, that'd be a
>> good option.  (Of course, SHTDI.)
>>
>> Tree sitter, as wonderful as it is, strikes me as a bit of a Rube
>> Goldberg machine architecturally: JS *and* Rust *and* C? Really? :-)

> I was wondering the same. How the hell? There had been some talks to
> support a more lightweight _javascript_ interpreter as an alternative but
> it hasn't gone anyway. Somehow because compatibility reason. I don't how
> could node be dependency for these. Grammars are mostly without
> dependencies except some have dependencies to other grammars on the
> source level such as the C++ require the C grammar.

I don’t think you need nodejs to build the grammar. You might need it to develop the grammar, but compiling grammar.js to parser.c only requires the tree-sitter CLI which is written in Rust.

The grammar.js is written in a lispy way, an is interpreted by node to expand out to a JSON format.  See the middle ofhttps://tree-sitter.github.io/tree-sitter/5-implementation.html :

==========
Parsing a Grammar
First, Tree-sitter must evaluate the _javascript_ code in grammar.js and convert the grammar to a JSON format. It does this by shelling out to node. The format of the grammars is formally specified by the JSON schema in grammar.schema.json. The parsing is implemented in parse_grammar.rs.
===========

The resulting JSON representation of the grammar is then compiled by the parser (table) generator written in Rust.

The _javascript_ form of the grammar could only use the functions defined by the tree-sitter node module (e.g. the "$" object, "choice" function, etc) which would be fairly trivial to transliterate into lisp form, but it can incorporate arbitrary JS code as well.

Lynn

Lynn


reply via email to

[Prev in Thread] Current Thread [Next in Thread]