help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: strange regexp behavior


From: David Bateman
Subject: Re: strange regexp behavior
Date: Tue, 23 Sep 2008 12:00:46 +0200
User-agent: Thunderbird 2.0.0.16 (X11/20080725)

Thomas Weber wrote:
Am Dienstag, den 23.09.2008, 11:04 +0200 schrieb David Bateman:
Thomas Weber wrote:
On Thu, Sep 18, 2008 at 11:34:55AM +0200, David Bateman wrote:
Thomas Weber wrote:
I'm currently discussing this with PCRE developers on their list, http://lists.exim.org/lurker/message/20080915.202357.7f71f452.en.html

I already learned that this innocent looking pattern is a bad citizen :)
Ok, tell us what happens with this thread.
Patch attached; I've included a user-visible warning, because Phil Hazel
(pcre's author) suggested that a regexp triggering such a problem should
be rephrased.

Maybe calling OCTAVE_QUIT in the added while-loop would be sensible as
well.

Patch is against 3.0.2, I had trouble compiling current tip.
        
Looking at the patch it looks ok to me, however maybe

while (matches == PCRE_ERROR_MATCHLIMIT)
  {
     pe.match_limit *= 10;
    matches = pcre_exec(re, &pe, buffer.c_str(),
     buffer.length(), idx,
     (idx ? PCRE_NOTBOL : 0),
     ovector, (subpatterns+1)*3);
   }

should be written as

#define PCRE_MATCHLIMIT_MAX 10

Yes, that's definitely a good idea.

int i = 0;
while (matches == PCRE_ERROR_MATCHLIMIT && i++ < PCRE_MATCHLIMIT_MAX)
  {
     pe.match_limit *= 10;
    matches = pcre_exec(re, &pe, buffer.c_str(),
     buffer.length(), idx,
     (idx ? PCRE_NOTBOL : 0),
     ovector, (subpatterns+1)*3);
   }

to avoid a truly infinite recursion in the pattern string. The max limit might need adjusting.. What issues did you have copiling with the tip?

Something with the lexer, I think. Maybe I was missing bison?

        Thomas


Ok, then what about the attached changeset for 3.1.51+ based on your patch that

* Adds a call to OCTAVE_QUIT
* Adds the loop counter to terminate for truly infinitely recursive patterns
* Adds a test based on the previously failing code.

I compiled and tested this sucessfully with 3.1.51+ and suppose it will work for 3.0.2+ as well.

Cheers
David

--
David Bateman                                address@hidden
Motorola Labs - Paris +33 1 69 35 48 04 (Ph) Parc Les Algorithmes, Commune de St Aubin +33 6 72 01 06 33 (Mob) 91193 Gif-Sur-Yvette FRANCE +33 1 69 35 77 01 (Fax) The information contained in this communication has been classified as: [x] General Business Information [ ] Motorola Internal Use Only [ ] Motorola Confidential Proprietary

# HG changeset patch
# User Thomas Weber <address@hidden>
# Date 1222163892 -7200
# Node ID 851180c51fbf7300ca43bd8fcbb952b7e7ffd68a
# Parent  f148896a9053ce7e3bf8df2f0c4391e5f4c15623
Increase pcre's match_limit for difficult regexps

diff --git a/src/DLD-FUNCTIONS/regexp.cc b/src/DLD-FUNCTIONS/regexp.cc
--- a/src/DLD-FUNCTIONS/regexp.cc
+++ b/src/DLD-FUNCTIONS/regexp.cc
@@ -51,6 +51,10 @@ along with Octave; see the file COPYING.
 #endif
 #include <regex.h>
 #endif
+
+// Define the maximum number of retries for a pattern that 
+// possibly results in an infinite recursion.
+#define PCRE_MATCHLIMIT_MAX 10
 
 // The regexp is constructed as a linked list to avoid resizing the
 // return values in arrays at each new match.
@@ -385,9 +389,31 @@ octregexp_list (const octave_value_list 
                                  (idx ? PCRE_NOTBOL : 0),
                                  ovector, (subpatterns+1)*3);
 
+         if (matches == PCRE_ERROR_MATCHLIMIT)
+           {
+             // try harder; start with default value for MATCH_LIMIT and 
increase it
+             warning("Your pattern caused PCRE to hit its MATCH_LIMIT.\nTrying 
harder now, but this will be slow.");
+             pcre_extra pe;
+             pcre_config(PCRE_CONFIG_MATCH_LIMIT, static_cast <void *> 
(&pe.match_limit));
+             pe.flags = PCRE_EXTRA_MATCH_LIMIT;
+
+             int i = 0;
+             while (matches == PCRE_ERROR_MATCHLIMIT &&
+                    i++ < PCRE_MATCHLIMIT_MAX)
+               {
+                 OCTAVE_QUIT;
+
+                 pe.match_limit *= 10;
+                 matches = pcre_exec(re, &pe, buffer.c_str(), 
+                                     buffer.length(), idx, 
+                                     (idx ? PCRE_NOTBOL : 0),
+                                     ovector, (subpatterns+1)*3);
+               }
+           }
+
          if (matches < 0 && matches != PCRE_ERROR_NOMATCH)
            {
-             error ("%s: internal error calling pcre_exec", nm.c_str());
+             error ("%s: internal error calling pcre_exec\nError code from 
pcre_exec is %i", nm.c_str(), matches);
              pcre_free(re);
              return 0;
            }
@@ -983,6 +1009,17 @@ The pattern is taken literally.\n\
 }
 
 /*
+
+## PCRE_ERROR_MATCHLIMIT test
+%!test
+%! 
s=sprintf('\t4\n0000\t-0.00\t-0.0000\t4\t-0.00\t-0.0000\t4\n0000\t-0.00\t-0.0000\t0\t-0.00\t-');
+%! ws = warning("query");
+%! unwind_protect
+%!   warning("off");
+%!   regexp(s, '(\s*-*\d+[.]*\d*\s*)+\n');
+%! unwind_protect_cleanup
+%!   warning(ws);
+%! end_unwind_protect
 
 ## seg-fault test
 %!assert(regexp("abcde","."),[1,2,3,4,5])

reply via email to

[Prev in Thread] Current Thread [Next in Thread]