bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug in gawk 3.1.1?


From: Aharon Robbins
Subject: Re: bug in gawk 3.1.1?
Date: Thu, 5 Sep 2002 13:44:59 +0300

In article <address@hidden>,
Stepan Kasal <address@hidden> wrote:
>Hello Lorenzo,
>
>On Wed, 04 Sep 2002 17:58:27 +0200, LorenzAtWork wrote:
>> address@hidden (Aharon Robbins) wrote:
>> > [...]
>> > I fixed this basically by using a heuristic.  If the RS regex
>> > ends in ?, *, or +, and the end of the regex match is within
>> > a few bytes of the end of the buffer, then read in some more
>> > text and try again.
>> > [...]
>
>>      /a(b+c)*bc/
>
>Or:
>       /a(|bcbbc)bc/

Yeah, I was hoping no-one would notice. Oh well.

>Thank you very much, I guess you are right and the heuristics should
>be applied in each case the regex _contains_ any of the following chars:
>
>       ? * + |
>
>Extending the heuristics this way should not break anything, it has the
>same properties as the original Aharon's:
>
>1) it catches some of the problem cases (though not all, of course,
>consider "a.*Z" with an occurence of "Z" at the end of file)
>
>2) it doesn't represent any problem except slight memory inefficiency
>when non-trivial RE's are used as RS.
>
>Have a nice day,
>       Stepan Kasal

Here's a patch, relative to yesterday's.  Have fun.

Arnold
------------------------------------------
*** awk.h.save  Wed Aug 21 15:40:04 2002
--- awk.h       Thu Sep  5 13:07:46 2002
***************
*** 1002,1007 ****
--- 1002,1008 ----
  extern void resyntax P((int syntax));
  extern void resetup P((void));
  extern int reisstring P((char *text, size_t len, Regexp *re, char *buf));
+ extern int remaybelong P((char *text, size_t len));
  
  /* strncasecmp.c */
  #ifndef BROKEN_STRNCASECMP
*** re.c.save   Wed Aug 21 13:52:10 2002
--- re.c        Thu Sep  5 13:07:22 2002
***************
*** 284,308 ****
  {
        static char metas[] = ".*+(){}[]|?^$\\";
        int i;
-       int has_meta = FALSE;
        int res;
        char *matched;
  
        /* simple checking for has meta characters in re */
        for (i = 0; i < len; i++) {
                if (strchr(metas, text[i]) != NULL) {
!                       has_meta = TRUE;
!                       break;
                }
        }
  
        /* make accessable to gdb */
        matched = &buf[RESTART(re, buf)];
  
-       if (has_meta)
-               return FALSE;   /* give up early, can't be string match */
- 
        res = STREQN(text, matched, len);
  
        return res;
  }
--- 284,317 ----
  {
        static char metas[] = ".*+(){}[]|?^$\\";
        int i;
        int res;
        char *matched;
  
        /* simple checking for has meta characters in re */
        for (i = 0; i < len; i++) {
                if (strchr(metas, text[i]) != NULL) {
!                       return FALSE;   /* give up early, can't be string match 
*/
                }
        }
  
        /* make accessable to gdb */
        matched = &buf[RESTART(re, buf)];
  
        res = STREQN(text, matched, len);
  
        return res;
  }
+ 
+ /* remaybelong --- return TRUE if the RE contains * ? | + */
+ 
+ int
+ remaybelong(char *text, size_t len)
+ {
+       while (len--) {
+               if (strchr("*+|?", *text++) != NULL) {
+                       return TRUE;
+               }
+       }
+ 
+       return FALSE;
+ }
*** io.c.fix1   Wed Sep  4 13:17:37 2002
--- io.c        Thu Sep  5 13:23:41 2002
***************
*** 2630,2642 ****
                         * This matches the "xyz" and ends up putting the
                         * "abc" into the front of the next record. Ooops.
                         *
!                        * The test for a *, +, or ? at the end of the RE
!                        * is a heuristic (spelled k l u d g e).
                         */
                        /* succession of tests is easier to trace in GDB. */
                        if (iop->cnt != EOF) {
!                               if (strchr("+*?", RS->stptr[RS->stlen-1]) != 
NULL) {
!                                       if ((iop->end - 
(start+REEND(rsre,start))) < RS->stlen) {
                                                bp = iop->end;
                                                continuing = TRUE;
                                                continue;
--- 2630,2647 ----
                         * This matches the "xyz" and ends up putting the
                         * "abc" into the front of the next record. Ooops.
                         *
!                        * The remaybelong() function looks to see if the
!                        * regex contains one of: + * ? |.  This is a very
!                        * simple heuristic, but in combination with the
!                        * "end of match within a few bytes of end of buffer"
!                        * check, should keep things reasonable.
                         */
                        /* succession of tests is easier to trace in GDB. */
                        if (iop->cnt != EOF) {
!                               if (remaybelong(RS->stptr, RS->stlen)) {
!                                       char *matchend = start + REEND(rsre, 
start);
! 
!                                       if (iop->end - matchend < RS->stlen) {
                                                bp = iop->end;
                                                continuing = TRUE;
                                                continue;
-- 
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd.     address@hidden
P.O. Box 354            Home Phone: +972  8 979-0381    Fax: +1 928 569 9018
Nof Ayalon              Cell Phone: +972 51  297-545
D.N. Shimshon 99785     ISRAEL




reply via email to

[Prev in Thread] Current Thread [Next in Thread]