bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] yysyntax_error: make it manage its own memory.


From: Joel E. Denny
Subject: Re: [PATCH] yysyntax_error: make it manage its own memory.
Date: Fri, 11 Sep 2009 03:40:19 -0400 (EDT)
User-agent: Alpine 1.00 (DEB 882 2007-12-20)

On Thu, 10 Sep 2009, Akim Demaille wrote:

> Le 10 sept. 2009 à 07:16, Joel E. Denny a écrit :

> > > From d3334adb83a8b87cfa7528ab184d0dbbe7283858 Mon Sep 17 00:00:00 2001
> > From: Joel E. Denny <address@hidden>
> > Date: Mon, 7 Sep 2009 22:01:16 -0400
> > Subject: [PATCH] yysyntax_error: make it manage its own memory.

> I'm not sure I understand what's going on here.  The code is meant to be used
> with either alloca or malloc, and now that you moved everything in
> syntax_error, I fail to see how alloca could work as expected.  Reading the
> patch (not the resulting code), it really seems like you're possibly returning
> some pointer to a alloca'ed block of memory.
> 
> I can't see how alloca could be used if not the way Paul made it.

You're right!  I forgot about alloca.  Thanks for catching that.

Here are two new patches that I'd like to apply to master and branch-2.5.

The first patch adds two new test groups to regression.at.  I normally 
don't like extending regression.at because I feel like it's the 
"miscellaneous" or "I can't think of a better category" file.  
Unfortunately, right now, I can't think of a better category.

The first test group exercises alloca with verbose error messages.  It 
reveals the bug you found in my above patch.  The second test group 
reveals an obscure bug in the existing verbose error message 
implementation.  I'm not sure how likely that bug is in a practical 
parser, but I guess it doesn't hurt to test for it.  Because both of these 
tests seem heavily dependent upon the exact yacc.c implementation we have 
now, I've added some sanity checks to them to be sure they stay meaningful 
in the future.

The second patch is another attempt to prepare yysyntax_error for my 
lookahead correction code but without breaking alloca support.  It also 
fixes that obscure bug I just mentioned.  More importantly, I think it 
makes the code more readable and slightly more efficient.  The approach 
this time is that yysyntax_error only requires a second invocation if 
reallocation is needed.

I'm finding this to be a tricky area.  I'd appreciate any comments anyone 
has on these patches.

>From dc23d678388635a058492294f6cd45105f581c98 Mon Sep 17 00:00:00 2001
From: Joel E. Denny <address@hidden>
Date: Thu, 10 Sep 2009 18:29:01 -0400
Subject: [PATCH] yacc.c: test yysyntax_error memory management more.

* tests/regression.at (-Dparse.error=verbose and
YYSTACK_USE_ALLOCA): New test group.
(-Dparse.error=verbose overflow): New test group that reveals
an obscure bug.  Expected fail for now.
---
 ChangeLog           |    8 ++
 tests/regression.at |  200 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 208 insertions(+), 0 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 6c97582..b508741 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,13 @@
 2009-09-10  Joel E. Denny  <address@hidden>
 
+       yacc.c: test yysyntax_error memory management more.
+       * tests/regression.at (-Dparse.error=verbose and
+       YYSTACK_USE_ALLOCA): New test group.
+       (-Dparse.error=verbose overflow): New test group that reveals
+       an obscure bug.  Expected fail for now.
+
+2009-09-10  Joel E. Denny  <address@hidden>
+
        Clean up yacc.c a little.
        * data/yacc.c: Clean up M4 for readability, and make output
        whitespace more consistent.  For the main parse function
diff --git a/tests/regression.at b/tests/regression.at
index 6bfc77e..7e17a91 100644
--- a/tests/regression.at
+++ b/tests/regression.at
@@ -1237,3 +1237,203 @@ AT_COMPILE([[input]])
 AT_PARSER_CHECK([[./input]])
 
 AT_CLEANUP
+
+
+
+## ---------------------------------------------- ##
+## -Dparse.error=verbose and YYSTACK_USE_ALLOCA.  ##
+## ---------------------------------------------- ##
+
+AT_SETUP([[-Dparse.error=verbose and YYSTACK_USE_ALLOCA]])
+
+AT_DATA_GRAMMAR([input.y],
+[[%code {
+  #include <stdio.h>
+  void yyerror (char const *);
+  int yylex (void);
+  #define YYSTACK_USE_ALLOCA 1
+}
+
+%define parse.error verbose
+
+%%
+
+start: check syntax_error syntax_error ;
+
+check:
+{
+  if (sizeof yymsgbuf > 128)
+    {
+      fprintf (stderr,
+               "The initial size of yymsgbuf in yyparse has increased\n"
+               "since this test group was last updated.  As a result,\n"
+               "this test group may no longer manage to induce a\n"
+               "reallocation of the syntax error message buffer.\n"
+               "This test group must be adjusted to produce a longer\n"
+               "error message.\n");
+      YYABORT;
+    }
+}
+;
+
+// Induce a syntax error message whose total length is more than
+// sizeof yymsgbuf in yyparse.  Each token here is 64 bytes.
+syntax_error:
+  "123456789112345678921234567893123456789412345678951234567896123A"
+| "123456789112345678921234567893123456789412345678951234567896123B"
+| error 'a' 'b' 'c'
+;
+
+%%
+
+void
+yyerror (char const *msg)
+{
+  fprintf (stderr, "%s\n", msg);
+}
+
+int
+yylex (void)
+{
+  /* Induce two syntax error messages (which requires full error
+     recovery by shifting 3 tokens) in order to detect any loss of the
+     reallocated buffer.  */
+  static char const *input = "abc";
+  return *input++;
+}
+
+int
+main (void)
+{
+  return yyparse ();
+}
+]])
+
+AT_BISON_CHECK([[-o input.c input.y]])
+AT_COMPILE([[input]])
+AT_PARSER_CHECK([[./input]], [[1]], [],
+[[syntax error, unexpected 'a', expecting 
123456789112345678921234567893123456789412345678951234567896123A or 
123456789112345678921234567893123456789412345678951234567896123B
+syntax error, unexpected $end, expecting 
123456789112345678921234567893123456789412345678951234567896123A or 
123456789112345678921234567893123456789412345678951234567896123B
+]])
+
+AT_CLEANUP
+
+
+
+## -------------------------------- ##
+## -Dparse.error=verbose overflow.  ##
+## -------------------------------- ##
+
+# Imagine the case where YYSTACK_ALLOC_MAXIMUM = YYSIZE_MAXIMUM and an
+# invocation of yysyntax_error has caused yymsg_alloc to grow to exactly
+# YYSTACK_ALLOC_MAXIMUM (perhaps because the normal doubling of size had
+# to be clipped to YYSTACK_ALLOC_MAXIMUM).  Now imagine a subsequent
+# invocation of yysyntax_error that overflows during its size
+# calculation and thus returns YYSIZE_MAXIMUM to yyparse.  Then, yyparse
+# will invoke yyerror using the old contents of yymsg.  This bug needs
+# to be fixed.
+
+AT_SETUP([[-Dparse.error=verbose overflow]])
+
+AT_XFAIL_IF([[:]])
+
+AT_DATA_GRAMMAR([input.y],
+[[%code {
+  #include <stdio.h>
+  void yyerror (char const *);
+  int yylex (void);
+
+  /* This prevents this test case from having to induce error messages
+     large enough to overflow size_t.  */
+  #define YYSIZE_T unsigned char
+
+  /* Bring in malloc so yacc.c doesn't try to provide a malloc prototype
+     using our YYSIZE_T.  */
+  #include <stdlib.h>
+
+  /* Max depth is usually much smaller than YYSTACK_ALLOC_MAXIMUM, and
+     we don't want gcc to warn everywhere this constant would be too big
+     to make sense for our YYSIZE_T.  */
+  #define YYMAXDEPTH 100
+}
+
+%define parse.error verbose
+
+%%
+
+start: syntax_error1 check syntax_error2 ;
+
+// Induce a syntax error message whose total length causes yymsg in
+// yyparse to be reallocated to size YYSTACK_ALLOC_MAXIMUM, which
+// should be 255.  Each token here is 64 bytes.
+syntax_error1:
+  "123456789112345678921234567893123456789412345678951234567896123A"
+| "123456789112345678921234567893123456789412345678951234567896123B"
+| "123456789112345678921234567893123456789412345678951234567896123C"
+| error 'a' 'b' 'c'
+;
+
+check:
+{
+  if (yymsg_alloc != YYSTACK_ALLOC_MAXIMUM
+      || YYSTACK_ALLOC_MAXIMUM != YYSIZE_MAXIMUM
+      || YYSIZE_MAXIMUM != 255)
+    {
+      fprintf (stderr,
+               "The assumptions of this test group are no longer\n"
+               "valid, so it may no longer catch the error it was\n"
+               "designed to catch.\n");
+      YYABORT;
+    }
+}
+;
+
+// Now overflow.
+syntax_error2:
+  "123456789112345678921234567893123456789412345678951234567896123A"
+| "123456789112345678921234567893123456789412345678951234567896123B"
+| "123456789112345678921234567893123456789412345678951234567896123C"
+| "123456789112345678921234567893123456789412345678951234567896123D"
+| "123456789112345678921234567893123456789412345678951234567896123E"
+;
+
+%%
+
+void
+yyerror (char const *msg)
+{
+  fprintf (stderr, "%s\n", msg);
+}
+
+int
+yylex (void)
+{
+  /* Induce two syntax error messages (which requires full error
+     recovery by shifting 3 tokens).  */
+  static char const *input = "abc";
+  return *input++;
+}
+
+int
+main (void)
+{
+  return yyparse ();
+}
+]])
+
+AT_BISON_CHECK([[-o input.c input.y]])
+
+# gcc warns about tautologies and fallacies involving comparisons for
+# unsigned char.  However, it doesn't produce these same warnings for
+# size_t and many other types when the warnings would seem to make just
+# as much sense.  We ignore the warnings.
+CFLAGS=`echo x"$CFLAGS" | sed s/-Werror// | sed s/^x//`
+AT_COMPILE([[input]])
+
+AT_PARSER_CHECK([[./input]], [[2]], [],
+[[syntax error, unexpected 'a', expecting 
123456789112345678921234567893123456789412345678951234567896123A or 
123456789112345678921234567893123456789412345678951234567896123B or 
123456789112345678921234567893123456789412345678951234567896123C
+syntax error
+memory exhausted
+]])
+
+AT_CLEANUP
-- 
1.5.4.3


>From b9bdbec37d95944fac7226374f6c5e325c808558 Mon Sep 17 00:00:00 2001
From: Joel E. Denny <address@hidden>
Date: Fri, 11 Sep 2009 02:54:53 -0400
Subject: [PATCH] yysyntax_error: avoid duplicate lookahead collection.

Except when memory reallocation is required, this change
eliminates the need to invoke yysyntax_error twice and thus to
repeat the collection of lookaheads.  It also prepares for
future extensions that will make those repetitions more
expensive and that will require additional memory management in
yysyntax_error.  Finally, it fixes an obscure bug already
exercised in the test suite.
* data/yacc.c (yysyntax_error): Add arguments for message
buffer variables stored in the parser.  Instead of size, return
status similar to yyparse status but indicating success of
message creation.  Other than the actual reallocation of the
message buffer, import and clean up memory management code
from...
(yyparse, yypush_parse): ... here.
* tests/regression.at (-Dparse.error=verbose overflow): No
longer an expected failure.
---
 ChangeLog           |   20 ++
 data/yacc.c         |  142 ++++++-----
 src/parse-gram.c    |  686 ++++++++++++++++++++++++++-------------------------
 src/parse-gram.h    |   12 +-
 tests/regression.at |   11 +-
 5 files changed, 452 insertions(+), 419 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index b508741..4d877c0 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,23 @@
+2009-09-11  Joel E. Denny  <address@hidden>
+
+       yysyntax_error: avoid duplicate lookahead collection.
+       Except when memory reallocation is required, this change
+       eliminates the need to invoke yysyntax_error twice and thus to
+       repeat the collection of lookaheads.  It also prepares for
+       future extensions that will make those repetitions more
+       expensive and that will require additional memory management in
+       yysyntax_error.  Finally, it fixes an obscure bug already
+       exercised in the test suite.
+       * data/yacc.c (yysyntax_error): Add arguments for message
+       buffer variables stored in the parser.  Instead of size, return
+       status similar to yyparse status but indicating success of
+       message creation.  Other than the actual reallocation of the
+       message buffer, import and clean up memory management code
+       from...
+       (yyparse, yypush_parse): ... here.
+       * tests/regression.at (-Dparse.error=verbose overflow): No
+       longer an expected failure.
+
 2009-09-10  Joel E. Denny  <address@hidden>
 
        yacc.c: test yysyntax_error memory management more.
diff --git a/data/yacc.c b/data/yacc.c
index 26c5996..c98cd55 100644
--- a/data/yacc.c
+++ b/data/yacc.c
@@ -889,26 +889,27 @@ yytnamerr (char *yyres, const char *yystr)
 }
 # endif
 
-/* Copy into YYRESULT an error message about the unexpected token
-   YYTOKEN while in state YYSTATE.  Return the number of bytes copied,
-   including the terminating null byte.  If YYRESULT is null, do not
-   copy anything; just return the number of bytes that would be
-   copied.  As a special case, return 0 if an ordinary "syntax error"
-   message will do.  Return YYSIZE_MAXIMUM if overflow occurs during
-   size calculation.  */
-static YYSIZE_T
-yysyntax_error (char *yyresult, int yystate, int yytoken)
+/* Copy into *YYMSG, which is of size *YYMSG_ALLOC, an error message
+   about the unexpected token YYTOKEN while in state YYSTATE.
+
+   Return 0 if *YYMSG was successfully written.  Return 1 if an ordinary
+   "syntax error" message will suffice instead.  Return 2 if *YYMSG is
+   not large enough to hold the message.  In the last case, also set
+   *YYMSG_ALLOC to either (a) the required number of bytes or (b) zero
+   if the required number of bytes is too large to store.  */
+static int
+yysyntax_error (YYSIZE_T *yymsg_alloc, char **yymsg,
+                int yystate, int yytoken)
 {
   int yyn = yypact[yystate];
 
   if (! (YYPACT_NINF < yyn && yyn <= YYLAST))
-    return 0;
+    return 1;
   else
     {
       YYSIZE_T yysize0 = yytnamerr (0, yytname[yytoken]);
       YYSIZE_T yysize = yysize0;
       YYSIZE_T yysize1;
-      int yysize_overflow = 0;
       enum { YYERROR_VERBOSE_ARGS_MAXIMUM = 5 };
       /* Internationalized format string. */
       const char *yyformat = 0;
@@ -942,11 +943,17 @@ yysyntax_error (char *yyresult, int yystate, int yytoken)
              }
            yyarg[yycount++] = yytname[yyx];
            yysize1 = yysize + yytnamerr (0, yytname[yyx]);
-           yysize_overflow |= (yysize1 < yysize);
+           if (! (yysize <= yysize1
+                  && yysize1 <= YYSTACK_ALLOC_MAXIMUM))
+             {
+               /* Overflow.  */
+               *yymsg_alloc = 0;
+               return 2;
+             }
            yysize = yysize1;
          }
 
-        switch (yycount)
+      switch (yycount)
         {
 #define YYCASE_(N, S)                           \
           case N:                               \
@@ -961,32 +968,42 @@ yysyntax_error (char *yyresult, int yystate, int yytoken)
         }
 
       yysize1 = yysize + yystrlen (yyformat);
-      yysize_overflow |= (yysize1 < yysize);
+      if (! (yysize <= yysize1 && yysize1 <= YYSTACK_ALLOC_MAXIMUM))
+        {
+          /* Overflow.  */
+          *yymsg_alloc = 0;
+          return 2;
+        }
       yysize = yysize1;
 
-      if (yysize_overflow)
-       return YYSIZE_MAXIMUM;
+      if (*yymsg_alloc < yysize)
+        {
+          *yymsg_alloc = 2 * yysize;
+          if (! (yysize <= *yymsg_alloc
+                 && *yymsg_alloc <= YYSTACK_ALLOC_MAXIMUM))
+            *yymsg_alloc = YYSTACK_ALLOC_MAXIMUM;
+          return 2;
+        }
 
-      if (yyresult)
-       {
-         /* Avoid sprintf, as that infringes on the user's name space.
-            Don't have undefined behavior even if the translation
-            produced a string with the wrong number of "%s"s.  */
-         char *yyp = yyresult;
-         int yyi = 0;
-         while ((*yyp = *yyformat) != '\0')
-            if (*yyp == '%' && yyformat[1] == 's' && yyi < yycount)
-              {
-                yyp += yytnamerr (yyp, yyarg[yyi++]);
-                yyformat += 2;
-              }
-            else
-              {
-                yyp++;
-                yyformat++;
-              }
-       }
-      return yysize;
+      /* Avoid sprintf, as that infringes on the user's name space.
+         Don't have undefined behavior even if the translation
+         produced a string with the wrong number of "%s"s.  */
+      {
+        char *yyp = *yymsg;
+        int yyi = 0;
+        while ((*yyp = *yyformat) != '\0')
+          if (*yyp == '%' && yyformat[1] == 's' && yyi < yycount)
+            {
+              yyp += yytnamerr (yyp, yyarg[yyi++]);
+              yyformat += 2;
+            }
+          else
+            {
+              yyp++;
+              yyformat++;
+            }
+      }
+      return 0;
     }
 }
 #endif /* YYERROR_VERBOSE */
@@ -1431,37 +1448,28 @@ yyerrlab:
 #if ! YYERROR_VERBOSE
       yyerror (]b4_yyerror_args[YY_("syntax error"));
 #else
-      {
-       YYSIZE_T yysize = yysyntax_error (0, yystate, yytoken);
-       if (yymsg_alloc < yysize && yymsg_alloc < YYSTACK_ALLOC_MAXIMUM)
-         {
-           YYSIZE_T yyalloc = 2 * yysize;
-           if (! (yysize <= yyalloc && yyalloc <= YYSTACK_ALLOC_MAXIMUM))
-             yyalloc = YYSTACK_ALLOC_MAXIMUM;
-           if (yymsg != yymsgbuf)
-             YYSTACK_FREE (yymsg);
-           yymsg = (char *) YYSTACK_ALLOC (yyalloc);
-           if (yymsg)
-             yymsg_alloc = yyalloc;
-           else
-             {
-               yymsg = yymsgbuf;
-               yymsg_alloc = sizeof yymsgbuf;
-             }
-         }
-
-       if (0 < yysize && yysize <= yymsg_alloc)
-         {
-           (void) yysyntax_error (yymsg, yystate, yytoken);
-           yyerror (]b4_yyerror_args[yymsg);
-         }
-       else
-         {
-           yyerror (]b4_yyerror_args[YY_("syntax error"));
-           if (yysize != 0)
-             goto yyexhaustedlab;
-         }
-      }
+      while (1)
+        {
+          int yysyntax_error_status =
+            yysyntax_error (&yymsg_alloc, &yymsg, yystate, yytoken);
+          if (yysyntax_error_status == 2 && yymsg_alloc > 0)
+            {
+              if (yymsg != yymsgbuf)
+                YYSTACK_FREE (yymsg);
+              yymsg = (char *) YYSTACK_ALLOC (yymsg_alloc);
+              if (yymsg)
+                continue;
+              yymsg = yymsgbuf;
+              yymsg_alloc = sizeof yymsgbuf;
+            }
+          if (yysyntax_error_status == 0)
+            yyerror (]b4_yyerror_args[yymsg);
+          else
+            yyerror (]b4_yyerror_args[YY_("syntax error"));
+          if (yysyntax_error_status == 2)
+            goto yyexhaustedlab;
+          break;
+        }
 #endif
     }
 
diff --git a/tests/regression.at b/tests/regression.at
index 7e17a91..efe72af 100644
--- a/tests/regression.at
+++ b/tests/regression.at
@@ -1327,16 +1327,13 @@ AT_CLEANUP
 # Imagine the case where YYSTACK_ALLOC_MAXIMUM = YYSIZE_MAXIMUM and an
 # invocation of yysyntax_error has caused yymsg_alloc to grow to exactly
 # YYSTACK_ALLOC_MAXIMUM (perhaps because the normal doubling of size had
-# to be clipped to YYSTACK_ALLOC_MAXIMUM).  Now imagine a subsequent
-# invocation of yysyntax_error that overflows during its size
-# calculation and thus returns YYSIZE_MAXIMUM to yyparse.  Then, yyparse
-# will invoke yyerror using the old contents of yymsg.  This bug needs
-# to be fixed.
+# to be clipped to YYSTACK_ALLOC_MAXIMUM).  In an old version of yacc.c,
+# a subsequent invocation of yysyntax_error that overflows during its
+# size calculation would return YYSIZE_MAXIMUM to yyparse.  Then,
+# yyparse would invoke yyerror using the old contents of yymsg.
 
 AT_SETUP([[-Dparse.error=verbose overflow]])
 
-AT_XFAIL_IF([[:]])
-
 AT_DATA_GRAMMAR([input.y],
 [[%code {
   #include <stdio.h>
-- 
1.5.4.3

reply via email to

[Prev in Thread] Current Thread [Next in Thread]