emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Update on tree-sitter structure navigation


From: Dmitry Gutov
Subject: Re: Update on tree-sitter structure navigation
Date: Wed, 6 Sep 2023 15:47:42 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

On 06/09/2023 05:51, Danny Freeman wrote:

Dmitry Gutov <dgutov@yandex.ru> writes:

Hi Yuan,

On 02/09/2023 08:01, Yuan Fu wrote:
- Solve the grammar versioning/breaking-change problem: tree-sitter grammar 
don’t have a version
number, so every time the author changes the grammar, our queries break, and 
loading the mode only
produces a giant error.

I don't have a better idea than basically copying NeoVim and others: to 
maintain the urls to parser
repositories and the ref of the latest known good revision, for the current 
version of the major
mode. That info could be filled in by major modes themselves, e.g. in an 
autoload block (similarly
to how auto-mode-alist is appended to).

clojure-ts-mode keeps a URL for the parser, but doesn't do anything
about the git revision. It easily could but I don't feel the need (yet)
since I am also a maintainer of the clojure grammar and know when we're
about to break grammar consumers.

Sure, that's easy enough to do when the package is only in ELPA: upgrade the grammar, upgrade the package, all in lockstep.

Unless nixos or other distros are going to start distributing it as well, and you'll need to care about having the recent clojure-ts-mode being loaded with old versions of the grammar.

It's not quite that simple though. Some distributions (nixos for
example) are already providing pre-compiled grammars. That is how I
discovered a couple recent bugs in js-ts-mode, because the grammars
distributed with nixos 23.05 no longer worked on Emacs 30 after a patch
was applied that was supposed to be backwards compatible (a real pain to
verify in my experience).

A helpful find. ;)

With the way Emacs can load a grammar provided by the user's
distribution, keeping information about the version of the grammar in
the major mode doesn't help all that much. Even if we did it we have no
idea what version might be have been built used the user's
.emacs.d/tree-sitter folder. That would require something like putting a
version number in the file name, or maybe applying a patch to the
grammar's C source that allowed us to get a version, SHA, something at
runtime.

Well, it would at least allow the user to rebuild the grammar to the version best known to work. Also, perhaps if the mode tracks the changes in the hash over time, it could see whether the grammar needs to be rebuilt. Finally, treesit-install-language-grammar could track which revision was last compiled.

So there is *something* we could do for the users who upgrade their grammars from Git.

Grammars distributed from distros are more of a problem, because it's not always a good idea to abort with "wrong version". But perhaps we could do that and recommend installing from Git in such cases anyway?

Another problem is that grammars don't have good versioning, and even if they did, we'd have to sometimes update the "upper bound" (we'd need coarse ranges, right? rather that one fixed version requirement) more frequently than Emacs is released. Less of a problem for modes in ELPA, though.

I'm not so sure we can have a great way to do this without a change to
the tree-sitter libraries. I would love to see some kind of increasing
version number generated in the grammar's C source that we could then
access. It could be used to make decisions about what queries to use, or
to warn the user they need to use a different grammar (maybe offering to
install a compatible version).

Yes, that would be an improvement, worth being up on the issue tracker maybe.

Tree-sitter grammar changes are almost always breaking changes. Adding
nodes can break things, re-naming them and removing them definitely can.
I'm not sure any grammar consumer has a great way to deal with this
without always compiling the exact grammar they need and only ever using
it.

That's my conclusion as well for the time being.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]