[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: grep -e '\(a\)\1' -e '\(b\)\1'
From: |
Tom Lord |
Subject: |
Re: grep -e '\(a\)\1' -e '\(b\)\1' |
Date: |
Sun, 18 Feb 2001 23:52:13 -0800 (PST) |
For example, should the following shell command
output nothing, or output a line containing "b"?
echo 'b' | grep '\(\(a\)\)*\2'
1003.2-1992 and the latest draft appear to be identical on
this point.
Section 2.8.3.3(3) of 1003.2-1992 makes it unambiguously clear that
that grep should output nothing: "b" does not match the pattern. Nor
does the pattern match the empty string.
The back-reference [...] shall match the same (possibly
empty) string of characters as was matched [by the corresponding
^^^^^^^^^^^^^^
subexpression].
In B.5.2, the question of "what is matched by a parenthesized subexpression"
is addressed. In that section, a parenthesized expression enclosed in
a "*" expression which matches zero times is said to "not participate
in the match" -- i.e., the parenthesized expression does not match
any string, even an empty string. Thus, the back-reference can not match
any string at all (even an empty string).
Another way to see this is to consider the `pmatch' output of regexec.
When matching \(\(a\)\)* against the string "b", you must get:
pmatch[2].rm_so == pmatch[2].rm_eo == -1
There is no substring of characters, even an empty string, in the
string "b" beginning at position -1. Therefore, after \(\(a\)\)*
matches the empty string, \2 can not match anything at all.
Paul Eggert writes:
In the discussion of BRE back-references, the latest POSIX draft says:
[....same material quoted above...]
In my opinion, this does not define the behavior of \(\(a\)\)*\2 when
the \(a\) never matched a string.
The text you quoted unambiguously does define the correct behavior.
Other sections, for example B.5.2 of 1003.2-1992, make this clear.
In this case: GNU grep is right; Solaris xpg4 is wrong.
I.e. even though GNU grep
and Solaris xpg4 grep act differently here, they both conform to
POSIX.
No. Solaris xpg4 (if it behaves as reported here) does not conform
to the Posix specification -- though I doubt any Posix test suites
check for this particular case.
The POSIX regexp standard is not that loose. It leaves some
constructs unspecified (e.g., backreferences in ERE). It specifies
all other cases completely and unambiguously.
The alternative interpretation you propose would lead to some odd
results.
Thomas Lord
regexps.com