|
From: | Daniel Colascione |
Subject: | Re: Tree Sitter (was Re: cc-mode fontification feels random) |
Date: | Wed, 21 Jul 2021 09:21:38 -0700 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 |
On 7/21/21 7:43 AM, Perry E. Metzger wrote:
On 7/20/21 21:28, Stefan Monnier wrote:Thought I would note that there's a substantial literature now on incremental parsing, especially the sort that is needed for editor tools. One doesn't need to reinvent the algorithms, they're out there waiting to be used. The Tree Sitter project is based on previous published work.Alternately, you can only store the length of each token (as tree-sitterdoes); then when processing queries, you have to add up all the lengths of the preceding tokens in order to report the buffer position of the information you are computing. That is also linear in the length of the buffer.You can probably get better than linear performance in "the usual case" by storing the total length of the subtree at each node of the AST. It's still theoretically linear in the worst case, of course.
There is indeed a big literature! I wish there were a bigger literature on *composable* incremental parsers though. IMHO, what we need is an incremental GLR system (yes, GLR is bad worst-case, but it's not a practical concern) that spits out a parse *forest* which we then pare down to a parse tree with ad-hoc syntactic consistency rules. Something like this naturally supports multi-language modes and incorporation of out-of-band semantic information.
[Prev in Thread] | Current Thread | [Next in Thread] |