bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#44351: Bug in grep v3.2 onwards in regular expression matching


From: Jim Meyering
Subject: bug#44351: Bug in grep v3.2 onwards in regular expression matching
Date: Sat, 31 Oct 2020 10:30:34 -0700

On Sat, Oct 31, 2020 at 9:17 AM Gonzalo Padrino
<grimalg.on+gnu@gmail.com> wrote:
> While using GNU grep v3.4 in an Ubuntu 20.04 userspace running on top of
> Win10 WSL (yeah, i know... but also checked in other envs) i discovered
> what seems like an obvious bug (if i'm not mistaken).
>   The bug:
> -----
> me@host:~$  echo 'xxxxy' |grep -E '^x+x+x+x+y$'
> xxxxy
> me@host:~$  echo 'xxxy' |grep -E '^x+x+x+x+y$'
> xxxy
> me@host:~$  echo 'xxy' |grep -E '^x+x+x+x+y$'
> xxy
> me@host:~$  echo 'xy' |grep -E '^x+x+x+x+y$'
>
> ----
> ...the terminal supports ansi color escapes, and what's really weird is
> that only the result from the first command is colored in red. First and
> fourth commands yield correct results; the second and third do not, as they
> should not match it's input.
>
>   I've tested releases from v3.1 to latest v3.5 and found the anomalous
> behaviour in version v3.2 through v3.5. A (quick and clunky) git bisect led
> me to believe it was introduced about two years ago, possibly in commit
> 123620af88f55c3e0cc9f0aed7311c72f625bc82 (
> https://git.savannah.gnu.org/cgit/grep.git/commit/?id=123620af88f55c3e0cc9f0aed7311c72f625bc82).
> If this is true, it would mean either the bug is in gnulib, or maybe grep
> needed to do some kind of extra handling on it's side.

Thank you for reporting that. I confirm this is a bug in the very latest.
This mistakenly matches:
  $ echo xxy |grep -E '^x+x+x+y$'
  xxy

That regular expression requires that any match have at least three
leading 'x's.

This is indeed due to a bug in gnulib's lib/dfa.c.

So far, I've found that we can band-aid fix it by disabling part of
merge_nfa_state's optimizations with this patch, but I do not propose
to make this change. This is just to show where the problem lies. I'm
pretty sure we can retain and correct the optimization.

diff --git a/lib/dfa.c b/lib/dfa.c
index 74aafa2ee..087c266c5 100644
--- a/lib/dfa.c
+++ b/lib/dfa.c
@@ -2459,7 +2459,7 @@ merge_nfa_state (struct dfa *d, idx_t tindex, char *flags,
                 continue;

               if (flags[sindex] & OPT_REPEAT)
-                delete (sindex, &follows[sindex]);
+                continue;

               merge2 (&follows[dindex], &follows[sindex], merged);





reply via email to

[Prev in Thread] Current Thread [Next in Thread]