|
From: | Paul Eggert |
Subject: | Re: [Grep-devel] [bug-gawk] GNU grep, awk, sed: support \u and \U for unicode |
Date: | Thu, 19 Jan 2017 18:48:59 -0800 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 |
Assaf Gordon wrote:
Currently, escape sequences are parsed and converted before being sent to re/dfa. Thus, '[\u0041]' is equivalent to '[A]'
POSIX requires [\u0041] to be equivalent to [u0041\], that is, it matches any of the characters '\', 'u', '0', '4', and '1'. This is true for grep, sed, and most other utilities that use regular expressions. (awk is an exception.) So except for awk, we can't simply translate \u escapes everywhere. At best we could translate them only if not POSIXLY_CORRECT.
On another topic, if we can't implement \N escapes in general then I wouldn't bother with implementing only \N{U+nnnn}.
[Prev in Thread] | Current Thread | [Next in Thread] |