bug-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 10/11] quote consistently and make tests pass with new quotin


From: Akim Demaille
Subject: Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib
Date: Mon, 23 Jan 2012 16:06:23 +0100

Le 23 janv. 2012 à 15:34, Jim Meyering a écrit :

>> I had never realized that the tests are not specifying LC_ALL=C
>> and they should.  But even when I do, I still have nice quotes.
> 
> Hi Akim,
> 
> Maybe you need to set LANG to empty or to C?
> glibc honors LANG (erroneously, imho)

My tests were on OS X.  LANG=C, or unset, does not
change anything.

Some digging led me into this:

> # if defined DARWIN7
>       /* To avoid the trouble of installing a file that is shared by many
>          GNU packages -- many packaging systems have problems with this --,
>          simply inline the aliases here.  */
>       cp = "ISO8859-1" "\0" "ISO-8859-1" "\0"
>            "ISO8859-2" "\0" "ISO-8859-2" "\0"
>            "ISO8859-4" "\0" "ISO-8859-4" "\0"
>            "ISO8859-5" "\0" "ISO-8859-5" "\0"
>            "ISO8859-7" "\0" "ISO-8859-7" "\0"
>            "ISO8859-9" "\0" "ISO-8859-9" "\0"
>            "ISO8859-13" "\0" "ISO-8859-13" "\0"
>            "ISO8859-15" "\0" "ISO-8859-15" "\0"
>            "KOI8-R" "\0" "KOI8-R" "\0"
>            "KOI8-U" "\0" "KOI8-U" "\0"
>            "CP866" "\0" "CP866" "\0"
>            "CP949" "\0" "CP949" "\0"
>            "CP1131" "\0" "CP1131" "\0"
>            "CP1251" "\0" "CP1251" "\0"
>            "eucCN" "\0" "GB2312" "\0"
>            "GB2312" "\0" "GB2312" "\0"
>            "eucJP" "\0" "EUC-JP" "\0"
>            "eucKR" "\0" "EUC-KR" "\0"
>            "Big5" "\0" "BIG5" "\0"
>            "Big5HKSCS" "\0" "BIG5-HKSCS" "\0"
>            "GBK" "\0" "GBK" "\0"
>            "GB18030" "\0" "GB18030" "\0"
>            "SJIS" "\0" "SHIFT_JIS" "\0"
>            "ARMSCII-8" "\0" "ARMSCII-8" "\0"
>            "PT154" "\0" "PT154" "\0"
>          /*"ISCII-DEV" "\0" "?" "\0"*/
>            "*" "\0" "UTF-8" "\0";
> # endif

which, IIUC, maps my "US-ASCII" (which is the
answer on my system for locale_codeset in locale_charset)
to UTF-8.  And then, it seems to be harded to use UTF-8
quotes in quoteargs.

> /* MSGID approximates a quotation mark.  Return its translation if it
>    has one; otherwise, return either it or "\"", depending on S.
> 
>    S is either clocale_quoting_style or locale_quoting_style.  */
> static char const *
> gettext_quote (char const *msgid, enum quoting_style s)
> {
>   char const *translation = _(msgid);
>   char const *locale_code;
> 
>   if (translation != msgid)
>     return translation;
> 
>   /* For UTF-8 and GB-18030, use single quotes U+2018 and U+2019.
>      Here is a list of other locales that include U+2018 and U+2019:
> 
>         ISO-8859-7   0xA1                 KOI8-T       0x91
>         CP869        0x8B                 CP874        0x91
>         CP932        0x81 0x65            CP936        0xA1 0xAE
>         CP949        0xA1 0xAE            CP950        0xA1 0xA5
>         CP1250       0x91                 CP1251       0x91
>         CP1252       0x91                 CP1253       0x91
>         CP1254       0x91                 CP1255       0x91
>         CP1256       0x91                 CP1257       0x91
>         EUC-JP       0xA1 0xC6            EUC-KR       0xA1 0xAE
>         EUC-TW       0xA1 0xE4            BIG5         0xA1 0xA5
>         BIG5-HKSCS   0xA1 0xA5            EUC-CN       0xA1 0xAE
>         GBK          0xA1 0xAE            Georgian-PS  0x91
>         PT154        0x91
> 
>      None of these is still in wide use; using iconv is overkill.  */
>   locale_code = locale_charset ();
>   fprintf (stderr, "charset: %s\n", locale_code);

I get "charset: UTF-8".

>   if (STRCASEEQ (locale_code, "UTF-8", 'U','T','F','-','8',0,0,0,0))
>     return msgid[0] == '`' ? "\xe2\x80\x98": "\xe2\x80\x99";
>   if (STRCASEEQ (locale_code, "GB18030", 'G','B','1','8','0','3','0',0,0))
>     return msgid[0] == '`' ? "\xa1\ae": "\xa1\xaf";
> 
>   return (s == clocale_quoting_style ? "\"" : "'");
> }


My understanding is that there is nothing prepared for me to override
this, since bison is using:

> /* Return an unambiguous printable representation of NAME,
>    allocated in slot N, suitable for diagnostics.  */
> char const *
> quote_n (int n, char const *name)
> {
>   return quotearg_n_style (n, locale_quoting_style, name);
> }

I could add some dependency on LC_ALL here, but it looks wrong.
It feels wrong that even with LC_CTYPE=C, I get UTF-8.






>> (And for my information, why is this function not-pure ?
> 
> Because it may fail to return normally (abort).

Yes, I see that, but there I found nothing against this in GCC's
documentation, and given the kind of optimization involved, I don't
see what can actually go wrong.

http://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc/Function-Attributes.html
> pure
> Many functions have no effects except the return value and their return value 
> depends only on the parameters and/or global variables. Such a function can 
> be subject to common subexpression elimination and loop optimization just as 
> an arithmetic operator would be. These functions should be declared with the 
> attribute pure. For example,
>           int square (int) __attribute__ ((pure));
>      
> 
> says that the hypothetical function square is safe to call fewer times than 
> the program says.
> 
> Some of common examples of pure functions are strlen or memcmp. Interesting 
> non-pure functions are functions with infinite loops or those depending on 
> volatile memory or other system resource, that may change between two 
> consecutive calls (such as feof in a multithreading environment).
> 
> The attribute pure is not implemented in GCC versions earlier than 2.96. 

(And I didn't find any bug report about the bad warning :)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]