[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: strange regexp behavior
From: |
David Bateman |
Subject: |
Re: strange regexp behavior |
Date: |
Tue, 23 Sep 2008 12:00:46 +0200 |
User-agent: |
Thunderbird 2.0.0.16 (X11/20080725) |
Thomas Weber wrote:
Am Dienstag, den 23.09.2008, 11:04 +0200 schrieb David Bateman:
Thomas Weber wrote:
On Thu, Sep 18, 2008 at 11:34:55AM +0200, David Bateman wrote:
Thomas Weber wrote:
I'm currently discussing this with PCRE developers on their list,
http://lists.exim.org/lurker/message/20080915.202357.7f71f452.en.html
I already learned that this innocent looking pattern is a bad citizen :)
Ok, tell us what happens with this thread.
Patch attached; I've included a user-visible warning, because Phil Hazel
(pcre's author) suggested that a regexp triggering such a problem should
be rephrased.
Maybe calling OCTAVE_QUIT in the added while-loop would be sensible as
well.
Patch is against 3.0.2, I had trouble compiling current tip.
Looking at the patch it looks ok to me, however maybe
while (matches == PCRE_ERROR_MATCHLIMIT)
{
pe.match_limit *= 10;
matches = pcre_exec(re, &pe, buffer.c_str(),
buffer.length(), idx,
(idx ? PCRE_NOTBOL : 0),
ovector, (subpatterns+1)*3);
}
should be written as
#define PCRE_MATCHLIMIT_MAX 10
Yes, that's definitely a good idea.
int i = 0;
while (matches == PCRE_ERROR_MATCHLIMIT && i++ < PCRE_MATCHLIMIT_MAX)
{
pe.match_limit *= 10;
matches = pcre_exec(re, &pe, buffer.c_str(),
buffer.length(), idx,
(idx ? PCRE_NOTBOL : 0),
ovector, (subpatterns+1)*3);
}
to avoid a truly infinite recursion in the pattern string. The max limit
might need adjusting.. What issues did you have copiling with the tip?
Something with the lexer, I think. Maybe I was missing bison?
Thomas
Ok, then what about the attached changeset for 3.1.51+ based on your
patch that
* Adds a call to OCTAVE_QUIT
* Adds the loop counter to terminate for truly infinitely recursive patterns
* Adds a test based on the previously failing code.
I compiled and tested this sucessfully with 3.1.51+ and suppose it will
work for 3.0.2+ as well.
Cheers
David
--
David Bateman address@hidden
Motorola Labs - Paris +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin +33 6 72 01 06 33 (Mob)
91193 Gif-Sur-Yvette FRANCE +33 1 69 35 77 01 (Fax)
The information contained in this communication has been classified as:
[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary
# HG changeset patch
# User Thomas Weber <address@hidden>
# Date 1222163892 -7200
# Node ID 851180c51fbf7300ca43bd8fcbb952b7e7ffd68a
# Parent f148896a9053ce7e3bf8df2f0c4391e5f4c15623
Increase pcre's match_limit for difficult regexps
diff --git a/src/DLD-FUNCTIONS/regexp.cc b/src/DLD-FUNCTIONS/regexp.cc
--- a/src/DLD-FUNCTIONS/regexp.cc
+++ b/src/DLD-FUNCTIONS/regexp.cc
@@ -51,6 +51,10 @@ along with Octave; see the file COPYING.
#endif
#include <regex.h>
#endif
+
+// Define the maximum number of retries for a pattern that
+// possibly results in an infinite recursion.
+#define PCRE_MATCHLIMIT_MAX 10
// The regexp is constructed as a linked list to avoid resizing the
// return values in arrays at each new match.
@@ -385,9 +389,31 @@ octregexp_list (const octave_value_list
(idx ? PCRE_NOTBOL : 0),
ovector, (subpatterns+1)*3);
+ if (matches == PCRE_ERROR_MATCHLIMIT)
+ {
+ // try harder; start with default value for MATCH_LIMIT and
increase it
+ warning("Your pattern caused PCRE to hit its MATCH_LIMIT.\nTrying
harder now, but this will be slow.");
+ pcre_extra pe;
+ pcre_config(PCRE_CONFIG_MATCH_LIMIT, static_cast <void *>
(&pe.match_limit));
+ pe.flags = PCRE_EXTRA_MATCH_LIMIT;
+
+ int i = 0;
+ while (matches == PCRE_ERROR_MATCHLIMIT &&
+ i++ < PCRE_MATCHLIMIT_MAX)
+ {
+ OCTAVE_QUIT;
+
+ pe.match_limit *= 10;
+ matches = pcre_exec(re, &pe, buffer.c_str(),
+ buffer.length(), idx,
+ (idx ? PCRE_NOTBOL : 0),
+ ovector, (subpatterns+1)*3);
+ }
+ }
+
if (matches < 0 && matches != PCRE_ERROR_NOMATCH)
{
- error ("%s: internal error calling pcre_exec", nm.c_str());
+ error ("%s: internal error calling pcre_exec\nError code from
pcre_exec is %i", nm.c_str(), matches);
pcre_free(re);
return 0;
}
@@ -983,6 +1009,17 @@ The pattern is taken literally.\n\
}
/*
+
+## PCRE_ERROR_MATCHLIMIT test
+%!test
+%!
s=sprintf('\t4\n0000\t-0.00\t-0.0000\t4\t-0.00\t-0.0000\t4\n0000\t-0.00\t-0.0000\t0\t-0.00\t-');
+%! ws = warning("query");
+%! unwind_protect
+%! warning("off");
+%! regexp(s, '(\s*-*\d+[.]*\d*\s*)+\n');
+%! unwind_protect_cleanup
+%! warning(ws);
+%! end_unwind_protect
## seg-fault test
%!assert(regexp("abcde","."),[1,2,3,4,5])
- Re: strange regexp behavior, (continued)
- Re: strange regexp behavior, dbateman, 2008/09/13
- Re: strange regexp behavior, Thomas Weber, 2008/09/14
- Re: strange regexp behavior, G.., 2008/09/18
- Re: strange regexp behavior, David Bateman, 2008/09/18
- Re: strange regexp behavior, G.., 2008/09/18
- Re: strange regexp behavior, David Bateman, 2008/09/18
- Re: strange regexp behavior, Thomas Weber, 2008/09/18
- Re: strange regexp behavior, David Bateman, 2008/09/18
- Re: strange regexp behavior, Thomas Weber, 2008/09/21
- Re: strange regexp behavior, David Bateman, 2008/09/23
- Message not available
- Re: strange regexp behavior,
David Bateman <=
- Re: strange regexp behavior, John W. Eaton, 2008/09/24
- Re: strange regexp behavior, Jaroslav Hajek, 2008/09/29