[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: possible grep bug: -i switch with character class
From: |
Chris Jerdonek |
Subject: |
Re: possible grep bug: -i switch with character class |
Date: |
Tue, 3 Nov 2009 23:19:00 -0800 |
On Tue, Nov 3, 2009 at 1:42 PM, Bob Proulx <address@hidden> wrote:
> Thank you for your report. I tried to reproduce the issue but believe
> you have insufficient quoting to know for sure exactly what the
> problem is in this case. It is either insufficient quoting or it is
> your choice of locale which changes the sort ordering.
Thanks for your help, Bob.
> Try this:
>
> $ echo TK | grep "[A-Z]K"
The quoting doesn't seem related to the issue in this case. The case
insensitive search still picks up fewer results--
$ echo TK | grep "[A-Z]K"
TK
echo TK | grep -i "[A-Z]K"
$
> Also note that the value of LANG, LC_COLLATE, and LC_ALL affect your
> character collation sequence within your locale.
Ah, okay. Here were my locale settings:
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
> Try setting LC_ALL=C
Setting LC_ALL=C does seem to give the expected behavior. Thanks!
> In en_US.UTF-8 for example "[A-Z]" matches "aAbBcC...zZ" and doesn't
> match lower case 'a' because that is the defined collation order for
> that locale.
Okay. But shouldn't a case insensitive grep search for "[A-Z]" still
pick up "T"? It looks like it only picks up "t" and not "T":
$ echo "tK" | grep -i "[A-Z]K"
tK
$ echo "TK" | grep -i "[A-Z]K"
$
Or does grep officially not support locales other than "C"?
In your example above with en_US.UTF-8, if the ordering is aAbB..., it
seems like "[A-Z]" should match "A" but not "a" when case sensitive.
And a case insensitive search should pick up strictly more -- i.e.
both "a" and "A". On the other hand, "[a-z]" should match both "a"
and "A" when case sensitive.
> The shell can demonstrate the issue this way:
> ...
> $ touch a A b B c C X x Y y Z z
Hmm, it doesn't seem like your example worked (even with
LC_ALL="en_US.UTF-8"). "[A-Z]" picked up just the upper case files,
and "[a-z]" just the lower case. Also, Mac seems to treat "a" and "A"
as the same file. So the above command created only 6 files instead
of 12 -- using the first filename in each case rather than the second.
Thanks,
--Chris