bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: possible grep bug: -i switch with character class


From: Bob Proulx
Subject: Re: possible grep bug: -i switch with character class
Date: Tue, 3 Nov 2009 14:42:33 -0700
User-agent: Mutt/1.5.18 (2008-05-17)

Chris Jerdonek wrote:
> There seems to be some sort of glitch with character classes and the
> -i (--ignore-case) switch.

Thank you for your report.  I tried to reproduce the issue but believe
you have insufficient quoting to know for sure exactly what the
problem is in this case.  It is either insufficient quoting or it is
your choice of locale which changes the sort ordering.

> Create a text file called "test.txt" that consists of a single line of
> text with the two letters T and K followed by a unix line feed, as
> shown below (also see the attached file):
> 
> TK

Good.  Also note that using 'echo' or 'printf' works well to create
small test cases like this.

> The case-insensitive search "grep -i [A-Z]K test.txt" finds no matches
> while the same search done case-sensitively (i.e. without the -i
> switch) does find a match.  See below for a console session that shows
> a couple other related searches that work.

I think you may have insufficiently quoted your pattern.  The brackets
are special characters to the shell and will match against filenames.
They are file globbing characters.

> $ grep [A-Z]K test.txt
> TK
> $ grep [T]K test.txt
> TK
> $ grep -i [T]K test.txt
> TK
> $ grep -i [^A-Z]K test.txt
> TK
> $ grep -i [A-Z]K test.txt
> $

Try this:

  $ echo TK | grep "[A-Z]K"
  TK

  $ echo TK | grep "[T]K"
  TK

  $ echo TK | grep -i "[T]K"
  TK

  $ echo TK | grep -i "[^A-Z]K"

  $ echo TK | grep -i "[A-Z]K"
  TK

Also note that the value of LANG, LC_COLLATE, and LC_ALL affect your
character collation sequence within your locale.  Try setting LC_ALL=C
to set a standard environment.

  $ locale

In en_US.UTF-8 for example "[A-Z]" matches "aAbBcC...zZ" and doesn't
match lower case 'a' because that is the defined collation order for
that locale.  The shell can demonstrate the issue this way:

  $ LC_ALL=en_US.UTF-8 bash

  $ mkdir mytestdir && cd ./mytestdir

  $ touch a A b B c C X x Y y Z z

  $ echo [a-z]
  a A b B c C x X y Y z

  $ echo [A-Z]
  A b B c C x X y Y z Z

Setting LC_ALL=C restores a standard sort ordering.  Personally I use
the following settings.  But this won't be valid for every
combination.  YMMV.

  export LANG=en_US.UTF-8
  export LC_COLLATE=C

Hope that helps,
Bob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]