Re: %union foo bar baz and others { ... }

bison-patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: %union foo bar baz and others { ... }

From:	Akim Demaille
Subject:	Re: %union foo bar baz and others { ... }
Date:	Sun, 02 Feb 2003 11:07:54 +0100
User-agent:	Gnus/5.090008 (Oort Gnus v0.08) Emacs/21.2 (i386-pc-linux-gnu)

 Paul> Akim Demaille <address@hidden> writes:
 >> I'm curious: I would really like to know why you think the "dirty
 >> hack" is dirty than the current solution.

 Paul> The dirty hack places undesirable constraints on the parser, since it
 Paul> requires that actions be executed in a particular order, with the
 Paul> order of actions' side effects being quite important.  For example, I
 Paul> don't offhand see how it would work correctly if we switch to a GLR
 Paul> parser: I suppose it might work, but it might not (particularly in
 Paul> error situations).

I should repeat that the dirty hack was only the closest code to what
I think the code should do in the future: *post* parse the actions.
Actions, and more generally any {}-code, should be completely opaque
to the grammar reader.

 Paul> Another way to put it is that the dirty hack is a path from the parser
 Paul> to the lexer, where the parser communicates context to the lexer, and
 Paul> the lexer behaves differently depending on context supplied by the
 Paul> parser.  This leads to well-known problems; it is similar to the
 Paul> problems with typedef'd identifiers in C.  It is undesirable for that
 Paul> reason.

 Paul> In the Bison 1.875 approach, there is no such communication.  The
 Paul> lexer is responsible for keeping track of the context.  The parser
 Paul> doesn't need to worry about supplying the context to the lexer.  And
 Paul> the context does not need to be recorded in a static variable, so this
 Paul> doesn't hurt the reentrancy of the lexer or parser.  These are all
 Paul> technical advantages.

 Paul> Of course, these advantages do not come without cost, since it does
 Paul> complicate the lexer to scan the larger "token" that includes both
 Paul> (say) "%union" and the braced code that comes after the "%union".

Which also kills the flexibility of the syntax, which is the whole
point of having a grammar file.

 >> And again, to me, the "dirty hack" is just a path to parsing the
 >> actions elsewhere, i.e., it is the best approximation for the time
 >> being, the code that leaves the scanners and parsers as close as
 >> possible to what it will be in the future.

 Paul> To understand this argument better I'd like to know more about how you
 Paul> plan to deal with scanning actions in the future.  A naive approach
 Paul> would be to scan the braced code twice: once in scan-gram.l, which
 Paul> simply returns a string containing the action's contents; and once in
 Paul> a routine invoked by the semantic analyzer that rescans the string
 Paul> looking for $$, $1, etc., and substituting as it goes.

 Paul> Unfortunately this naive approach would have other problems, as it
 Paul> would require that we must write two scanners for actions, and each
 Paul> scanner would have to know the intricacies of C lexical analysis,
 Paul> including such brain damaged features as UCNs, multibyte characters,
 Paul> and backslash-newline.  (The problems would be different with non-C
 Paul> languages, of course, but I'm just trying to see how the C solution
 Paul> would work.)

I agree with all this, although I am still waiting to see someone
exercise all these features in a real world Bison file.  Too bad there
is no Flex-includes, but we will have to deal with that.

Some languages, such as Perl, are not particularly happy of our $ and
@ uses.  Detaching the parsing of the action will make it possible to
support more different {} syntaxes.

 Paul> So I guess you must be thinking of another approach, in which
 Paul> scan-gram.l escapes $ and @ in a safe way, such that the semantic
 Paul> analyzer can simply walk through the string looking for a single
 Paul> escape character without having to understand the lexical rules
 Paul> of C.

No, I do think about double scanning.

 Paul> A downside of this approach is that the braced code will still need to
 Paul> be scanned twice, with twice the CPU and memory costs; but I guess
 Paul> efficiency is not that big of a deal these days.  However, another
 Paul> downside is that both scans will have to worry about multibyte
 Paul> characters, which will be a hassle; or we'll have to convert back and
 Paul> forth between multibyte and wide characters, which will be another
 Paul> hassle.

I don't quite follow you here.  I don't see why Bison should peek at
the host language's escaping rules.  It needs to know where
string/character/comment literal start and end.  The "meaning" of them
is left to the compiler.

 Paul> I suspect that any attempt to go to the "future approach" that I
 Paul> guessed at above will require many more changes to the parser and
 Paul> lexer than the 20 to 30 lines at issue here.  The switch to the
 Paul> "future approach" will be dozens or hundreds of times more
 Paul> complicated.  I don't see why this current minor change would have
 Paul> much effect on the feasibility of the big "future approach" change.

For instance, the BT parser should be introduced soon.  It will
require something like %! { some code which is always executed, even
when in non-deterministic exploration }.  Your approach requires
another set of jumps and hoops, while the parser approach does not.
The parser approach keeps us free from constraints such as "the
keyword must be immediately before".

[Prev in Thread]

Current Thread

[Next in Thread]

Re: %union foo bar baz and others { ... }, Akim Demaille <=
- Re: %union foo bar baz and others { ... }, Paul Eggert, 2003/02/03

Prev by Date: Re: ending ;
Next by Date: Re: `YYERROR;' should not discard lookahead token
Previous by thread: Re: ending ;
Next by thread: Re: %union foo bar baz and others { ... }
Index(es):
- Date
- Thread