[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: I think I may have found a regex bug in a recent version of grep?
From: |
Stepan Kasal |
Subject: |
Re: I think I may have found a regex bug in a recent version of grep? |
Date: |
Tue, 8 Jun 2004 10:53:06 +0200 |
User-agent: |
Mutt/1.4.1i |
Hello,
On Mon, Jun 07, 2004 at 08:37:47PM -0700, Steve Ingram wrote:
> Let me know if you need any other info or let me know
> if you'd rather not hear from me again :)
I think you understand perfectly the difference.
According to your knowledge you were reporting a possible bug,
and that's OK. So: thank you for your bug report.
> address@hidden > grep -v "\.[a-z]*" data.txt
First, I suggest using single quotes in this situation.
Even though "\." is the same as '\.', backslash is a special char
inside double quotes, thus to match one backslash you'd have to write
"\\\\" instead of simpler '\\'
"\.[a-z]*"
Of course this is is equivalent to "\." as "[a-z]*" means zero or more
occurences and can thus always match the empty string.
Thus
grep '\.'
matches all lines of data.txt.
> address@hidden > grep -v "\.[a-z]" data.txt
This is a trick with locales. Your locale is probably set to
"en_US.utf8" and it changes the order of characters from
ABC...Z ... abc...z
to
aAbB...yYzZ
This means that the interval a-z contains all capital letters with except
capital Z.
Observe:
$ echo A | LC_ALL=en_US.utf8 grep '[a-z]'
A
$ echo A | LC_ALL=C grep '[a-z]'
$
Thus the fix is to use "export LC_ALL=C" at the beginning of all your bash
scripts (similarily for other shells).
I beleive this explains all your problems.
Sorry for the inconvenience,
Stepan Kasal