bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#34053: [PATCH] grep: fix slow for multiple word matching


From: Norihiro Tanaka
Subject: bug#34053: [PATCH] grep: fix slow for multiple word matching
Date: Wed, 27 Nov 2019 07:36:21 +0900

On Sun, 13 Jan 2019 08:45:47 +0900
Norihiro Tanaka <address@hidden> wrote:

> Hi,
> 
> grep uses KWset matcher for multiple word matching.  It is very slow when
> most of the parts matched to a pattern are not words.  So, if a part firstly
> matched to pattern is not a word, use the grep matcher to match for its line.
> 
> By the way, if START_PTR is set, grep matcher uses regex matcher which is
> very slow to match words.  Therefore, we use grep matcher when only START_PTR
> is not set.
> 
> Example, although it is a very extreme case...
> 
> $ cat >pat <<EOF
> 0
> 00 0
> 00 00 0
> 00 00 00 0
> 00 00 00 00 0
> 00 00 00 00 00 0
> 00 00 00 00 00 00 0
> 00 00 00 00 00 00 00 0
> 00 00 00 00 00 00 00 00 0
> 00 00 00 00 00 00 00 00 00 0
> 00 00 00 00 00 00 00 00 00 00 0
> 00 00 00 00 00 00 00 00 00 00 00 0
> 00 00 00 00 00 00 00 00 00 00 00 00 0
> 00 00 00 00 00 00 00 00 00 00 00 00 00 0
> EOF
> $ yes '00 00 00 00 00 00 00 00 00 00 00 00 00' | head -1000000 >inp
> 
> $ env LC_ALL=C time -p src/grep -wf pat inp
> real 5.75
> user 5.67
> sys 0.02
> 
> Retry after applied the patch.
> 
> $ env LC_ALL=C time -p src/grep -wf pat inp
> real 0.32
> user 0.31
> sys 0.00
> 
> Thanks,
> Norihiro

I fix previous patch.

This change should not be applied for multibyte locales, as grep matcher
uses regex with pattern with invert charclass in word matching in
multibyte locales and it is very slow.

Attachment: 0001-grep-fix-slow-multiple-word-matching.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]