[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: %union foo bar baz and others { ... }
From: |
Akim Demaille |
Subject: |
Re: %union foo bar baz and others { ... } |
Date: |
Sun, 02 Feb 2003 11:07:54 +0100 |
User-agent: |
Gnus/5.090008 (Oort Gnus v0.08) Emacs/21.2 (i386-pc-linux-gnu) |
Paul> Akim Demaille <address@hidden> writes:
>> I'm curious: I would really like to know why you think the "dirty
>> hack" is dirty than the current solution.
Paul> The dirty hack places undesirable constraints on the parser, since it
Paul> requires that actions be executed in a particular order, with the
Paul> order of actions' side effects being quite important. For example, I
Paul> don't offhand see how it would work correctly if we switch to a GLR
Paul> parser: I suppose it might work, but it might not (particularly in
Paul> error situations).
I should repeat that the dirty hack was only the closest code to what
I think the code should do in the future: *post* parse the actions.
Actions, and more generally any {}-code, should be completely opaque
to the grammar reader.
Paul> Another way to put it is that the dirty hack is a path from the parser
Paul> to the lexer, where the parser communicates context to the lexer, and
Paul> the lexer behaves differently depending on context supplied by the
Paul> parser. This leads to well-known problems; it is similar to the
Paul> problems with typedef'd identifiers in C. It is undesirable for that
Paul> reason.
Paul> In the Bison 1.875 approach, there is no such communication. The
Paul> lexer is responsible for keeping track of the context. The parser
Paul> doesn't need to worry about supplying the context to the lexer. And
Paul> the context does not need to be recorded in a static variable, so this
Paul> doesn't hurt the reentrancy of the lexer or parser. These are all
Paul> technical advantages.
Paul> Of course, these advantages do not come without cost, since it does
Paul> complicate the lexer to scan the larger "token" that includes both
Paul> (say) "%union" and the braced code that comes after the "%union".
Which also kills the flexibility of the syntax, which is the whole
point of having a grammar file.
>> And again, to me, the "dirty hack" is just a path to parsing the
>> actions elsewhere, i.e., it is the best approximation for the time
>> being, the code that leaves the scanners and parsers as close as
>> possible to what it will be in the future.
Paul> To understand this argument better I'd like to know more about how you
Paul> plan to deal with scanning actions in the future. A naive approach
Paul> would be to scan the braced code twice: once in scan-gram.l, which
Paul> simply returns a string containing the action's contents; and once in
Paul> a routine invoked by the semantic analyzer that rescans the string
Paul> looking for $$, $1, etc., and substituting as it goes.
Paul> Unfortunately this naive approach would have other problems, as it
Paul> would require that we must write two scanners for actions, and each
Paul> scanner would have to know the intricacies of C lexical analysis,
Paul> including such brain damaged features as UCNs, multibyte characters,
Paul> and backslash-newline. (The problems would be different with non-C
Paul> languages, of course, but I'm just trying to see how the C solution
Paul> would work.)
I agree with all this, although I am still waiting to see someone
exercise all these features in a real world Bison file. Too bad there
is no Flex-includes, but we will have to deal with that.
Some languages, such as Perl, are not particularly happy of our $ and
@ uses. Detaching the parsing of the action will make it possible to
support more different {} syntaxes.
Paul> So I guess you must be thinking of another approach, in which
Paul> scan-gram.l escapes $ and @ in a safe way, such that the semantic
Paul> analyzer can simply walk through the string looking for a single
Paul> escape character without having to understand the lexical rules
Paul> of C.
No, I do think about double scanning.
Paul> A downside of this approach is that the braced code will still need to
Paul> be scanned twice, with twice the CPU and memory costs; but I guess
Paul> efficiency is not that big of a deal these days. However, another
Paul> downside is that both scans will have to worry about multibyte
Paul> characters, which will be a hassle; or we'll have to convert back and
Paul> forth between multibyte and wide characters, which will be another
Paul> hassle.
I don't quite follow you here. I don't see why Bison should peek at
the host language's escaping rules. It needs to know where
string/character/comment literal start and end. The "meaning" of them
is left to the compiler.
Paul> I suspect that any attempt to go to the "future approach" that I
Paul> guessed at above will require many more changes to the parser and
Paul> lexer than the 20 to 30 lines at issue here. The switch to the
Paul> "future approach" will be dozens or hundreds of times more
Paul> complicated. I don't see why this current minor change would have
Paul> much effect on the feasibility of the big "future approach" change.
For instance, the BT parser should be introduced soon. It will
require something like %! { some code which is always executed, even
when in non-deterministic exploration }. Your approach requires
another set of jumps and hoops, while the parser approach does not.
The parser approach keeps us free from constraints such as "the
keyword must be immediately before".
- Re: %union foo bar baz and others { ... },
Akim Demaille <=