bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Memory exhausted when doing a case-insensitive match to an empty reg


From: Tony Leneis
Subject: Re: Memory exhausted when doing a case-insensitive match to an empty regexp (gawk)
Date: Wed, 26 Oct 2005 15:42:07 -0700
User-agent: Mutt/1.5.4i

On Wed, Oct 26, 2005 at 04:31:44PM +0200, Karel Zak wrote:
> On Tue, 2005-10-25 at 23:25 -0700, Tony Leneis wrote:
> >     Gawk has started having problems with case-insensitive empty
> > regexp matches sometime between version 3.1.1 and 3.1.4.  Here's what I
> > see with gawk 3.1.4 and 3.1.5:
> > 
> > # gawk 'BEGIN { IGNORECASE=0; print "test" ~ "" }'
> > 1
> > # gawk 'BEGIN { IGNORECASE=1; print "test" ~ "" }'
> > gawk: fatal: memory exhausted
> > 
> > When I try the same program under gawk 3.0.3, 3.1.0, and 3.1.1 I get a
> > response of 1 regardless of how IGNORECASE is set.
> 
> It works for me:
> 
>       $ ./gawk 'BEGIN { IGNORECASE=1; print "test" ~ "" }';
>       1
>       $ ./gawk --version | head -1
>       GNU Awk 3.1.5
> 
> Note that it's raw upstream version without any patch. You should try it
> with gdb.

The error is coming from dfacomp() in dfa.c (gawk itself isn't crashing.)
This works:

# GAWK_NO_DFA=1 gawk 'BEGIN { IGNORECASE=1; print "test" ~ "" }'
1

The following code is near the top of dfacomp() and is run if case_fold
is set:

      lcopy = malloc(len);
      if (!lcopy)
        dfaerror(_("memory exhausted"));

My guess is len == 0, which means malloc() is being asked to allocate a
block of 0 bytes, which according to my copy of the standard C library
means the behavior of malloc() is implementation dependent.  The
implementation on my system happens to return a NULL pointer, which then
triggers dfaerror().  Your implementation probably returned a unique but
indeterminate pointer to 0 bytes of free memory...

Here is an extremely naive and barely tested patch that seems to solve
the problem for me (I just treat an empty case insensitive regexp the
same as an empty case sensitive regexp since both should be handled the
same way.)  Note that this is just the quick hack I did to make gawk
3.1.5 work on my system, and is not necessarily the best way to solve
the problem:

--- dfa.c.orig  2005-10-26 22:20:10.000000000 +0000
+++ dfa.c       2005-10-26 22:20:26.000000000 +0000
@@ -3060,7 +3060,7 @@
 void
 dfacomp (char const *s, size_t len, struct dfa *d, int searchflag)
 {
-  if (case_fold)       /* dummy folding in service of dfamust() */
+  if (case_fold && len)        /* dummy folding in service of dfamust() */
     {
       char *lcopy;
       int i;

For example, it might be better to do something in re.c to not use the
dfa code when the regexp length is 0.

-Tony


This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please notify us immediately by e-mail and delete the message and any 
attachments from your system.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]