[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: RFC: enum instead of #define for tokens
From: |
Hans Aberg |
Subject: |
Re: RFC: enum instead of #define for tokens |
Date: |
Sat, 6 Apr 2002 00:01:11 +0200 |
At 11:54 -0800 2002/04/05, Paul Eggert wrote:
>> Is this cross compiler problem common?
>
>It depends on what you mean by "common". If you use EBCDIC it's
>common. If you use the non-ASCII part of ISO 8859-1 and are
>collaborating with someone else who's using some other character set
>in the ISO 8859 series, it's common. Assuming Bison supports
>multibyte character sets properly (isn't that how we got started on
>this thread?), a similar problem occurs with the non-ASCII parts of
>UTF-8, EUC-JP, shift-JIS, etc.
The point is, if you compile it on Bison one platform, and then transport
the output sources to another. But the problem may show up anyway
somewhere. -- One ends up with questions that ultimately has to do with the
failings of C/C++, not Bison.
>> -- Note that the problem does not exist for Unicode UTF-n encodings
>
>Only if everyone agrees to use that particular extension to ASCII.
I'm not sure what you mean here: If Bison has a Unicode feature to be
turned on, then that will work only for Unicode UTF-n, n >=21, streams, but
they will agree on any platform; the compacted yytranslate[] table will be
the same on any platform. Further, Linux evidently already using those
UTF-32, so as far as GNU is concerned, it should be a non-issue.
It is probably only backwards MSOS that uses UTF-16; but that ain't GNU. If
one uses UTF-16 and not symbols requiring more than one 16-bit binary
character, then the yytranslate[] table will be the same as of UTF-n, n >=
21.
>> Note that one may want to use the yytranslate[] table as is if one is using
>> distributed programming, say a WWW-browser reading ASCII on an EBCDIC
>> computer.
>
>Yes, that's the sort of scenario I was worried about.
But here it is a desirable feature: Only compile the sources with Bison on
the ASCII platform, and it will compile correctly on the EBCDIC computer.
The alternative would be to write sources like
char ASCII_a = 0x41;
...
and then handwrite the lexer using that. This is what a guy writing a WWW
server told me he was doing. -- Extremely painful.
One ends up the question of defining which encodings the parser and lexer
should be able to handle.
Under C++, this can be done by hooking onto a code converter on the IO
streams. Thus, if one decides to settle for Unicode UTF-n, n >= 21,
internally in Flex/Bison, then the generated combined lexer/parser can be
made to parse any encoding by invoking the platform specific code
converter: Just compile the sources on say a Linux machine, which does it
correctly, and the compacted yytranslate[] table will be correct for
Unicode. On another platform, then invoke the local code converter from the
favorite format to Unicode UTF-n.
Hans Aberg
- Re: RFC: enum instead of #define for tokens, (continued)
- Re: RFC: enum instead of #define for tokens, Paul Eggert, 2002/04/03
- Re: RFC: enum instead of #define for tokens, Miles Bader, 2002/04/03
- Re: RFC: enum instead of #define for tokens, Akim Demaille, 2002/04/04
- Re: RFC: enum instead of #define for tokens, Hans Aberg, 2002/04/04
- Re: RFC: enum instead of #define for tokens, Paul Eggert, 2002/04/04
- Re: RFC: enum instead of #define for tokens, Akim Demaille, 2002/04/05
- Re: RFC: enum instead of #define for tokens, Paul Eggert, 2002/04/05
- Re: RFC: enum instead of #define for tokens, Akim Demaille, 2002/04/05
- Re: RFC: enum instead of #define for tokens, Hans Aberg, 2002/04/05
- Re: RFC: enum instead of #define for tokens, Paul Eggert, 2002/04/05
- Re: RFC: enum instead of #define for tokens,
Hans Aberg <=
- Re: RFC: enum instead of #define for tokens, Akim Demaille, 2002/04/08
- Re: RFC: enum instead of #define for tokens, Hans Aberg, 2002/04/08
- Re: RFC: enum instead of #define for tokens, Akim Demaille, 2002/04/08
- Re: RFC: enum instead of #define for tokens, Hans Aberg, 2002/04/08
Re: RFC: enum instead of #define for tokens, Hans Aberg, 2002/04/02