bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: possible grep bug: -i switch with character class


From: Chris Jerdonek
Subject: Re: possible grep bug: -i switch with character class
Date: Tue, 3 Nov 2009 23:19:00 -0800

On Tue, Nov 3, 2009 at 1:42 PM, Bob Proulx <address@hidden> wrote:

> Thank you for your report.  I tried to reproduce the issue but believe
> you have insufficient quoting to know for sure exactly what the
> problem is in this case.  It is either insufficient quoting or it is
> your choice of locale which changes the sort ordering.

Thanks for your help, Bob.

> Try this:
>
>  $ echo TK | grep "[A-Z]K"

The quoting doesn't seem related to the issue in this case.  The case
insensitive search still picks up fewer results--

$ echo TK | grep "[A-Z]K"
TK
echo TK | grep -i "[A-Z]K"
$

> Also note that the value of LANG, LC_COLLATE, and LC_ALL affect your
> character collation sequence within your locale.

Ah, okay.  Here were my locale settings:

$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

>  Try setting LC_ALL=C

Setting LC_ALL=C does seem to give the expected behavior.  Thanks!

> In en_US.UTF-8 for example "[A-Z]" matches "aAbBcC...zZ" and doesn't
> match lower case 'a' because that is the defined collation order for
> that locale.

Okay.  But shouldn't a case insensitive grep search for "[A-Z]" still
pick up "T"?  It looks like it only picks up "t" and not "T":

$ echo "tK" | grep -i "[A-Z]K"
tK
$ echo "TK" | grep -i "[A-Z]K"
$

Or does grep officially not support locales other than "C"?

In your example above with en_US.UTF-8, if the ordering is aAbB..., it
seems like "[A-Z]" should match "A" but not "a" when case sensitive.
And a case insensitive search should pick up strictly more -- i.e.
both "a" and "A".  On the other hand, "[a-z]" should match both "a"
and "A" when case sensitive.

>  The shell can demonstrate the issue this way:
>  ...
>  $ touch a A b B c C X x Y y Z z

Hmm, it doesn't seem like your example worked (even with
LC_ALL="en_US.UTF-8").  "[A-Z]" picked up just the upper case files,
and "[a-z]" just the lower case.  Also, Mac seems to treat "a" and "A"
as the same file.  So the above command created only 6 files instead
of 12 -- using the first filename in each case rather than the second.

Thanks,
--Chris




reply via email to

[Prev in Thread] Current Thread [Next in Thread]