bug-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Bison & Unicode


From: Hans Aberg
Subject: RE: Bison & Unicode
Date: Wed, 20 Dec 2000 01:03:55 +0100

At 14:13 -0600 0-12-19, Keefe, Dan wrote:
>While I agree that this would be a possible solution, I think that keeping
>a parser (Bison) and a lexer (Flex) separate makes more sense.  Currently
>Flex (or lex) is responsible for tokenization and makes these tokens available
>for Bison (or yacc). Since Bison only cares about what type of token it
>gets in >order to determine what rule to follow, there should be no
>dependency between >the current state in the parser and the token input
>stream structure since the >token type can be determined in the lexer
>exclusively.  On the other hand, >semantic actions need to avail
>themselves of the text (or converted value of >the text) of the current
>token.  In this situation, it must be the case that >the parser has
>knowledge of at least the data type being used by the lexer in >order to
>pass this data to other functional parts of the program.  It seems to >me
>that exposing the lexer's data type is the fundamental problem and this
>>could be done by commonly declared type for the lexer and parser.
>Although it >may be nice to also expose the encoding scheme and code page
>to the parser, >this is certainly not necessary since this can be
>encapsulated exterior to the >parser.
>
>I think that embedding lexical analysis capability within a parser would
>reduce the level of abstraction of these separate concerns and cause more
>>problems than it would be worth.

Which lexical analyzer do you intend to you with Unicode? :-)

-- By the way, there is a combined lexical-analyzer/compiler-compiler,
http://www.antlr.org/.

-- But in my post I suggested that Bison is invoked twice, so
lexing/parsing would still be separate (yyparse of one becomes yylex of the
other). Even though I think that if one create a "language hierarchy", with
the sentences of the language below being the tokens of the one above, it
can be reduced to a single grammar, by simply making the sentence-tokens
into variables.

-- I have no idea now do get by doing such a thing; I just want to know if
there is some obvious reason for not trying it, such as efficiency.

  Hans Aberg
                  * Email: Hans Aberg <mailto:address@hidden>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]