Re: [SPAM UNSURE] Re: Tree Sitter (was Re: cc-mode fontification feels r

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [SPAM UNSURE] Re: Tree Sitter (was Re: cc-mode fontification feels r

From:	Daniel Colascione
Subject:	Re: [SPAM UNSURE] Re: Tree Sitter (was Re: cc-mode fontification feels random)
Date:	Sat, 24 Jul 2021 17:41:22 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

On 7/24/21 1:05 PM, Stephen Leake wrote:

Daniel Colascione <dancol@dancol.org> writes:

On 7/21/21 12:15 PM, Perry E. Metzger wrote:

On 7/21/21 12:21, Daniel Colascione wrote:

On 7/21/21 7:43 AM, Perry E. Metzger wrote:

Thought I would note that there's a substantial literature now on
incremental parsing, especially the sort that is needed for editor
tools. One doesn't need to reinvent the algorithms, they're out
there waiting to be used. The Tree Sitter project is based on
previous published work.

There is indeed a big literature! I wish there were a bigger
literature on *composable* incremental parsers though. IMHO, what
we need is an incremental GLR system (yes, GLR is bad worst-case,
but it's not a practical concern) that spits out a parse *forest*
which we then pare down to a parse tree with ad-hoc syntactic
consistency rules. Something like this naturally supports
multi-language modes and incorporation of out-of-band semantic
information.

Tree sitter handles GLR.

Cool. How does it prune the parse forest?

wisi also uses GLR. It prunes trees during parse when the parse stacks
contained in the trees are identical; it uses error recover cost and
length to decide which tree to delete, or picks one at random. It's an
error if more than one tree is alive at the end of parse. That's because
programming languages must be unambiguous. It would be possible to adapt
the wisi parser to use some other pruning strategy.

Programs *as a whole*, properly understood by a compiler or executionenvironment, must be unambiguous. That's true. But when we're editing,we're dealing with program fragments, sometimes damaged by usermodifications, and have to do our best given fragmentary information.All I'm suggesting is that it'd be useful to use language-specificsemantic rules to disambiguate parse trees: for example, if in locationL1, symbol T can be a type or a name, and in location L2, symbol T isdefinitely a type, then we should regard symbol T as a type in locationL1 too. Iterate until we reach a fixed point, and only *then* apply themore general disambiguation strategies you've mentioned.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Tree Sitter (was Re: cc-mode fontification feels random), (continued)

Prev by Date: Re: How to add pseudo vector types
Next by Date: Re: master e315544: ; * src/fileio.c (Fdirectory_append): Doc fix.
Previous by thread: Re: [SPAM UNSURE] Re: Tree Sitter (was Re: cc-mode fontification feels random)
Next by thread: Re: [SPAM UNSURE] Re: [SPAM UNSURE] Re: Tree Sitter (was Re: cc-mode fontification feels random)
Index(es):
- Date
- Thread