emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [SPAM UNSURE] Re: Tree Sitter (was Re: cc-mode fontification feels r


From: Daniel Colascione
Subject: Re: [SPAM UNSURE] Re: Tree Sitter (was Re: cc-mode fontification feels random)
Date: Sat, 24 Jul 2021 17:41:22 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

On 7/24/21 1:05 PM, Stephen Leake wrote:

Daniel Colascione <dancol@dancol.org> writes:

On 7/21/21 12:15 PM, Perry E. Metzger wrote:
On 7/21/21 12:21, Daniel Colascione wrote:
On 7/21/21 7:43 AM, Perry E. Metzger wrote:
Thought I would note that there's a substantial literature now on
incremental parsing, especially the sort that is needed for editor
tools. One doesn't need to reinvent the algorithms, they're out
there waiting to be used. The Tree Sitter project is based on
previous published work.
There is indeed a big literature! I wish there were a bigger
literature on *composable* incremental parsers though. IMHO, what
we need is an incremental GLR system (yes, GLR is bad worst-case,
but it's not a practical concern) that spits out a parse *forest*
which we then pare down to a parse tree with ad-hoc syntactic
consistency rules. Something like this naturally supports
multi-language modes and incorporation of out-of-band semantic
information.

Tree sitter handles GLR.

Cool. How does it prune the parse forest?
wisi also uses GLR. It prunes trees during parse when the parse stacks
contained in the trees are identical; it uses error recover cost and
length to decide which tree to delete, or picks one at random. It's an
error if more than one tree is alive at the end of parse. That's because
programming languages must be unambiguous. It would be possible to adapt
the wisi parser to use some other pruning strategy.


Programs *as a whole*, properly understood by a compiler or execution environment, must be unambiguous. That's true. But when we're editing, we're dealing with program fragments, sometimes damaged by user modifications, and have to do our best given fragmentary information. All I'm suggesting is that it'd be useful to use language-specific semantic rules to disambiguate parse trees: for example, if in location L1, symbol T can be a type or a name, and in location L2, symbol T is definitely a type, then we should regard symbol T as a type in location L1 too. Iterate until we reach a fixed point, and only *then* apply the more general disambiguation strategies you've mentioned.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]