[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: /usr/bin/printf: invalid universal character name
From: |
Jim Meyering |
Subject: |
Re: /usr/bin/printf: invalid universal character name |
Date: |
Sun, 11 May 2008 18:48:05 +0200 |
Hermann Peifer <address@hidden> wrote:
> Jim wrote:
>> Hermann Peifer <address@hidden> wrote:
>>
>>> printf \uHHHH is expected to print Unicode chars. This work fine in
>>> most cases, but some legal code points are reported as errors: values
>>> in the ASCII range and C1 control chars, and values between
>>> U+D800..U+DFFF
>>>
>>> I would say that this behaviour is rather a bug than a feature.
>>>
>>
>> Thanks for the report, but this is not some arbitrary restriction,
>> but rather conformance to the standard (C99, ISO/IEC 10646) for
>> "universal character name" syntax:
>>
>> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n717.htm
>>
>> Here's part of printf.c, with a comment that probably came from
>> a version of N717:
>>
>> /* A universal character name shall not specify a character short
>> identifier in the range 00000000 through 00000020, 0000007F through
>> 0000009F, or 0000D800 through 0000DFFF inclusive. A universal
>> character name shall not designate a character in the required
>> character set. */
>> if ((uni_value <= 0x9f
>> && uni_value != 0x24 && uni_value != 0x40 && uni_value != 0x60)
>> || (uni_value >= 0xd800 && uni_value <= 0xdfff))
>> error (EXIT_FAILURE, 0, _("invalid universal character name \\%c%0*x"),
>> esc_char, (esc_char == 'u' ? 4 : 8), uni_value);
>>
>>
>>> /usr/bin/printf: invalid universal character name \u0000
>>> /usr/bin/printf: invalid universal character name \u0001
>>>
>> ...
>>
>> I can understand that you'd find the restriction surprising,
>> but I wouldn't call it a bug.
>>
> Thanks for your swift reply. (BTW: are mails to address@hidden
> not copied to gnu.utils.bug?)
No. That's a separate list.
> I do acknowledge that C0 and C1 control chars are some sort of a
> border case. It is true that the Unicode standard does not assign
> *normative names* for them but rather adds the placeholder "<control>"
> as a dummy name (btw, this was different in earlier versions of
> Unicode). However, all C0 and C1 *code points* are at least included
> in:
>
> http://www.unicode.org/charts/PDF/U0000.pdf
> http://www.unicode.org/charts/PDF/U0080.pdf
> http://www.unicode.org/Public/5.1.0/ucd/UnicodeData.txt
>
> And I didn't expect /usr/bin/printf to worry about normative or
> non-normative names of Unicode chars, but rather print the chars
> themselves.
>
> If we let the control chars question aside, it is still hard to
> believe that it is not a bug that almost all ASCII chars 0020..007e
> lead to EXIT_FAILURE. This rule is more than peculiar, to say the
> least and it is also inconsistent with its own comment:
>
> if ((uni_value <= 0x9f
> && uni_value != 0x24 && uni_value != 0x40 && uni_value != 0x60)
>
>
> Only DOLLAR SIGN, COMMERCIAL AT and GRAVE ACCENT are legal in the
> range 0x00..0x9f ?
>
> I still think that these 92 cases are bugs, rather than anything else:
>
> /usr/bin/printf: invalid universal character name \u0020
> /usr/bin/printf: invalid universal character name \u0021
...
I don't know the motivation for those exceptions.
Paul Eggert added this feature 8 years ago, so things may have changed.
FYI, there are plenty of odd-looking exceptions in this domain.
For a taste, see the function, ucn_valid_in_identifier, in gcc's
libcpp/charset.c
That code determines that this is valid C99 code (with -fextended-identifiers):
int ok\u09CB = 1;
but this is not:
int not_ok\u09FF = 1;
- /usr/bin/printf: invalid universal character name, Hermann Peifer, 2008/05/11
- Re: /usr/bin/printf: invalid universal character name, Bruno Haible, 2008/05/14
- Re: /usr/bin/printf: invalid universal character name, Jim Meyering, 2008/05/15
- Re: /usr/bin/printf: invalid universal character name, Hermann Peifer, 2008/05/15
- Re: /usr/bin/printf: invalid universal character name, Jim Meyering, 2008/05/15
- Re: /usr/bin/printf: invalid universal character name, Bruno Haible, 2008/05/15
- Re: /usr/bin/printf: invalid universal character name, Hermann Peifer, 2008/05/15