guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Parsing


From: Ian Grant
Subject: Parsing
Date: Mon, 15 Sep 2014 13:36:44 -0400

LALR parser generators are not the last word in parsing technology. Many unambiguous context-free languages are not well suited to LALR parsing. The C language is an exception: the C grammar from the ISO standard can be typed almost verbatim into a YACC/Bison file, and it will produce just the one well-documented shift/reduce conflict in the IF ELSE statement, which is fortuitously resolved the right way. But this is achieved at some cost, because the C grammar does not actually constrain the parsed statements to those which are well-formed. In addition to the grammar, the language definition includes a slew of extra conditions which need to be checked. These checks must be coded into a post-parsing phase, because they are not things that the LALR tables can encode.

Important examples of grammars which are not LALR parsable include ASN.1 and Standard ML. When Claudio Russo added higher order functors to Moscow ML, he apparently wasn't able to figure out how to get the LALR parser to parse the grammar for the language he'd invented. So the Moscow ML parser produces 37 shift/reduce conflicts. I don't think anyone actually knows how those higher order declarations are parsed in practice.

But I think the problem of parsing has actually been solved. Tom Ridge's "p3" combinator parsers with oracles will parse any unambiguous CFG in polynomial time complexity at most O^3, or possibly O^2, I need to check, in the length of the input. The advantage of combinator parsers is that they parse top-down, in a way which follows the natural structure of the grammar. So errors can be explained to the user (and the developer!) in terms of the actual derivations (parsing decisions) taken on the input, This is not the case with bottom-up parsers such as LALR.

It would be really good to have a p3-like parser generator working in Guile. The only technical part is integrating the oracle with the combinator parsers. Ridge's example uses an Earley parser, but he points out that any parser generator could be used as the oracle. So perhaps LALR parsers would do. The grammar the oracles need to be able to parse is much simpler than the general CFG's the combinators work on: the oracles are just there to make the decisions about where to split the left-recursive productions. But if the LALR oracle doesn't work out, then it is easy to write an Early parser in scheme. See http://www.cavar.me/damir/charty/scheme/ which is LGPLed.

See Ridge's papers at http://www.tom-ridge.com/parsing.html

Again, one cold implement the recursive parser combinators in scheme-generated lightning. If this were done abstractly then Guile could generate fast machine code parsers for any host language.

Ian



reply via email to

[Prev in Thread] Current Thread [Next in Thread]