[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: grep -e '\(a\)\1' -e '\(b\)\1'
From: |
Paul Eggert |
Subject: |
Re: grep -e '\(a\)\1' -e '\(b\)\1' |
Date: |
Sun, 18 Feb 2001 17:22:42 -0800 (PST) |
> From: "Alain Magloire" <address@hidden>
> Date: Sun, 18 Feb 2001 15:26:36 -0500 (EST)
>
> > > > echo ba | egrep '(a)\1|(b)\1'
> It will match "aaba".
>
> Now I do not know what POSIX.2 says about this.
POSIX doesn't say anything about ERE back-references; the committee
discussed the idea but rejected standardization.
However, it may amuse you to know that a similar problem occurs with
BRE back-references. For example, should the following shell command
output nothing, or output a line containing "b"?
echo 'b' | grep '\(\(a\)\)*\2'
GNU grep outputs nothing, but Solaris 8 xpg4 grep outputs "b".
This is because GNU grep says \2 does not match if the corresponding
subexpression never matched, but Solaris 8 xpg4 grep says \2 matches
the empty string in that case.
In the discussion of BRE back-references, the latest POSIX draft says:
The back-reference expression '\n' shall match the same (possibly
empty) string of characters as was matched by a subexpression
enclosed between "\(" and "\)" preceding the '\n'.... When the
referenced subexpression matched more than one string, the
back-referenced expression shall refer to the last matched string.
If the subexpression referenced by the back-reference matches more
than one string because of an asterisk ('*') or an interval
expression (see item (5)), the back-reference shall match the last
(rightmost) of these strings.
In my opinion, this does not define the behavior of \(\(a\)\)*\2 when
the \(a\) never matched a string. Hence POSIX does not define the
behavior of "grep" on the above example. I.e. even though GNU grep
and Solaris xpg4 grep act differently here, they both conform to
POSIX. (Personally, I prefer the GNU grep behavior. :-)