bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: possible grep bug: -i switch with character class


From: Bob Proulx
Subject: Re: possible grep bug: -i switch with character class
Date: Wed, 4 Nov 2009 01:50:26 -0700
User-agent: Mutt/1.5.18 (2008-05-17)

Chris Jerdonek wrote:
> > $ echo TK | grep "[A-Z]K"
> 
> The quoting doesn't seem related to the issue in this case.  The case
> insensitive search still picks up fewer results--

Hmm...  Curiouser and curiouser.

> $ echo TK | grep "[A-Z]K"
> TK
> echo TK | grep -i "[A-Z]K"
> $

That does seem incorrect.  Since you reported that you were using the
Mac OS X version 2.5.1 then I would report this problem them.  This
may be something Mac specific.  It doesn't seem to me to be a problem
in the upstream source.

Alternatively I would grab the latest stable grep sources and compile
them yourself.  Grep compiles easy and shouldn't be a problem to
compile from source.  (Although I don't use a Mac and so have no first
hand knowledge there.)  Searching the web shows me a darwinports page
specifically for this.  The GNU grep source is here:

  ftp://ftp.gnu.org/gnu/grep/

If it works for you compiled from upstream source and not from the
stock Mac version then it is probably a problem specific to the Mac
version that we wouldn't know about here.  And if it doesn't work that
way then we have something that we could work through.

> >  Try setting LC_ALL=C
> 
> Setting LC_ALL=C does seem to give the expected behavior.  Thanks!

Oh good.  At least you have some progress to show.  You will see that
in use a lot these days.

> > In en_US.UTF-8 for example "[A-Z]" matches "aAbBcC...zZ" and doesn't
> > match lower case 'a' because that is the defined collation order for
> > that locale.
> 
> Okay.  But shouldn't a case insensitive grep search for "[A-Z]" still
> pick up "T"?  It looks like it only picks up "t" and not "T":

I would think so yes.  I was simply trying to describe the sequence.
Otherwise it is just too confusing.

> $ echo "tK" | grep -i "[A-Z]K"
> tK
> $ echo "TK" | grep -i "[A-Z]K"
> $

I can't recreate that behavior.  (Works for me.)

> > The shell can demonstrate the issue this way:
> > ...
> > $ touch a A b B c C X x Y y Z z
> 
> Hmm, it doesn't seem like your example worked (even with
> LC_ALL="en_US.UTF-8").  "[A-Z]" picked up just the upper case files,
> and "[a-z]" just the lower case.  Also, Mac seems to treat "a" and "A"
> as the same file.  So the above command created only 6 files instead
> of 12 -- using the first filename in each case rather than the second.

Oh wow.  I didn't expect that.  Good information for me.  But at least
you figured out the point I was attempting to make.

I am sorry but I think you will need to do some debugging in order to
get to the root cause of the problem.  It seems to work correctly on
GNU systems.  This problem seems to be something specific to the Mac.

Also posting to bug-gnu-utils for grep is okay.  It is a generic
catchall for all things gnu.  You will have people like me here
helping.  But for grep there is a specific mailing list just for grep
bugs.  For more follow-up you would reach the grep maintainers
directly at address@hidden instead of here.  That is the bug
reporting address listed in the grep --help output.  More folks there
know the internals of the source code more intently and should be able
to provide even more help.  You can search archives of bug-grep here:

  http://lists.gnu.org/archive/html/bug-grep/

And in fact looking just now I find a copy of your bug report there.
Good stuff!  So whether you realized it or not you had already posted
there. :-)

Hope this helps,

Bob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]