|
From: | Daniel Colascione |
Subject: | Re: [SPAM UNSURE] Re: Tree Sitter (was Re: cc-mode fontification feels random) |
Date: | Sat, 24 Jul 2021 17:41:22 -0700 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 |
On 7/24/21 1:05 PM, Stephen Leake wrote:
Daniel Colascione <dancol@dancol.org> writes:On 7/21/21 12:15 PM, Perry E. Metzger wrote:On 7/21/21 12:21, Daniel Colascione wrote:On 7/21/21 7:43 AM, Perry E. Metzger wrote:Thought I would note that there's a substantial literature now on incremental parsing, especially the sort that is needed for editor tools. One doesn't need to reinvent the algorithms, they're out there waiting to be used. The Tree Sitter project is based on previous published work.There is indeed a big literature! I wish there were a bigger literature on *composable* incremental parsers though. IMHO, what we need is an incremental GLR system (yes, GLR is bad worst-case, but it's not a practical concern) that spits out a parse *forest* which we then pare down to a parse tree with ad-hoc syntactic consistency rules. Something like this naturally supports multi-language modes and incorporation of out-of-band semantic information.Tree sitter handles GLR.Cool. How does it prune the parse forest?wisi also uses GLR. It prunes trees during parse when the parse stacks contained in the trees are identical; it uses error recover cost and length to decide which tree to delete, or picks one at random. It's an error if more than one tree is alive at the end of parse. That's because programming languages must be unambiguous. It would be possible to adapt the wisi parser to use some other pruning strategy.
Programs *as a whole*, properly understood by a compiler or execution environment, must be unambiguous. That's true. But when we're editing, we're dealing with program fragments, sometimes damaged by user modifications, and have to do our best given fragmentary information. All I'm suggesting is that it'd be useful to use language-specific semantic rules to disambiguate parse trees: for example, if in location L1, symbol T can be a type or a name, and in location L2, symbol T is definitely a type, then we should regard symbol T as a type in location L1 too. Iterate until we reach a fixed point, and only *then* apply the more general disambiguation strategies you've mentioned.
[Prev in Thread] | Current Thread | [Next in Thread] |