bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#68725: GNU grep and sed behaving unexpectedly with multiple 1-or-0 R


From: Jim Meyering
Subject: bug#68725: GNU grep and sed behaving unexpectedly with multiple 1-or-0 RE capture groups and backreferences
Date: Mon, 5 Feb 2024 23:02:42 -0800

On Fri, Jan 26, 2024 at 6:51 AM Ed Morton <mortoneccc@comcast.net> wrote:
>
> There are issues (mostly common but some not) using a regexp like this:
>
>     |^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$|
>
> with GNU grep and GNU sed, hence my contacting both mailing lists but
> apologies if that was the wrong starting point.
>
> This started out as a question on StackOverflow,
> (https://stackoverflow.com/questions/77820540/searching-palindromes-with-grep-e-egrep/77861446?noredirect=1#comment137299746_77861446)
> but my "answer" and some comments from there copied below so you don't
> have to look anywhere else for a description of the issues.
>
> Given this input file:
>
> |a|
> |ab|
> |abba|
> |abcdef|
> |abcba|
> |zufolo|
> |||Removing the `$` from the end of the regexp (i.e. making it less
> restrictive) produces fewer matches, which is the opposite of what it
> should do: a) With the `$` at the end of the regexp: $ grep -E
> '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$' sample a abba abcba zufolo b)
> Without the `$` at the end of the regexp: $ grep -E
> '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1' sample a abba abcba

Thanks for reporting that. This is as far as I've gotten for now, but
this sure looks like a bug:

  $ echo zufolo | grep -E '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$'
  zufolo

Obviously, that string should not match.

Note that it works properly with the -P option in place of that -E.

> It's not just
> GNU grep that behaves strangely, GNU sed has the same behavior from the
> question when just matching with `sed -nE '/.../p' sample` as GNU `grep`
> does AND sed behaves differently if we're just doing a match vs if we're
> doing a match + replace. For example here's `sed` doing a
> match+replacement and behaving the same way as `grep` above: a) With the
> `$` at the end of the regexp: $ sed -nE
> 's/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$/&/p' sample a abba abcba zufolo b)
> Without the `$` at the end of the regexp: $ sed -nE
> 's/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1/&/p' sample a abba abcba but here's
> sed just doing a match and behaving differently from any of the above:
> a) With the `$` at the end of the regexp (note the extra `ab` in the
> output): $ sed -nE '/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$/p' sample a ab
> abba abcba zufolo b) Without the `$` at the end of the regexp (note the
> extra `ab` and `abcdef` in the output): $ sed -nE
> '/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1/p' sample a ab abba abcdef abcba
> zufolo Also interestingly this: $ sed -nE
> 's/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$/<&>/p' sample outputs: <a> <abba>
> <abcba> <>zufolo the last line of which means the regexp is apparently
> matching the start of the line and ignoring the `$` end-of-string
> metachar present in the regexp! The odd behavior isn't just associated
> with using `-E`, though, if I remove `-E` and just use [POSIX compliant
> BREs](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03)
> then: a) With the `$` at the end of the regexp: $ grep
> '^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1$'
> sample a abba abcba zufolo <p> $ sed -n
> 's/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1$/&/p'
> sample a abba abcba zufolo b) Without the `$` at the end of the regexp:
> $ grep
> '^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1'
> sample a abba abcba <p> $ sed -n
> 's/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1/&/p'
> sample a abba abcba and again just doing a match in sed below behaves
> differently from the sed match+replacements above: a) With the `$` at
> the end of the regexp: $ sed -n
> '/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1$/p'
> sample a ab abba abcba zufolo b) Without the `$` at the end of the
> regexp: $ sed -n
> '/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1/p'
> sample a ab abba abcdef abcba zufolo The above shows that, given the
> same regexp, sed is apparently matching different strings depending on
> whether it's doing a substitution or not. These are the version I was
> using when testing above: $ grep --version | head -1 grep (GNU grep)
> 3.11 $ sed --version | head -1 sed (GNU sed) 4.9 It was later pointed
> out that grep in git-=bash produces an error message and core dumps
> given the original regexp above|, e.g. |grep -E 
> '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1' sample| and |grep -E
> '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$' sample| both output|: a assertion
> "num >= 0" failed: file "regexec.c", line 1394, function: pop_fail_stack
> Aborted (core dumped)|. Sorry, I can't copy the core off that machine
> for corporate reasons. Those git-bash tests were using |$ echo
> $BASH_VERSION| |5.2.15(1)-release ||$ grep --version||grep (GNU grep) 3.0|
> |Regards, Ed Morton |
>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]