bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] Fix handling of yychar manipulation in user semantic actions.


From: Joel E. Denny
Subject: [PATCH] Fix handling of yychar manipulation in user semantic actions.
Date: Wed, 16 Dec 2009 21:29:23 -0500 (EST)
User-agent: Alpine 1.00 (DEB 882 2007-12-20)

A few months ago, I started talking about a lookahead correction mechanism 
I was working on for Bison.  Its most obvious benefit is to correct the 
list of expected tokens in verbose syntax error messages.  I'm finally 
getting back to pushing that patch series through.

The following patch is next.  The problem it addresses is that users 
sometimes write semantic actions that manipulate yychar.  Of course, they 
can do this with YYBACKUP or yyclearin.  However, my impression is that 
many users alter yychar directly.  In this case, the translation in 
yytoken isn't always updated, and bad destructor calls or syntax error 
messages can result.  This patch fixes that.

I'd like to push this patch to master and branch-2.5.  Any feedback would 
be appreciated.  I'd especially like to hear comments on my analysis of 
glr.c, lalr1.cc, and lalr1.java.

Don't worry about the unfixed bug I mention in the test suite.  The next 
patch will more thoroughly explain it and fix it.

>From cfdf80e9e688706a449dc68662e5e4ad0674ab01 Mon Sep 17 00:00:00 2001
From: Joel E. Denny <address@hidden>
Date: Wed, 16 Dec 2009 20:55:40 -0500
Subject: [PATCH] Fix handling of yychar manipulation in user semantic actions.

The problem was that yacc.c didn't always update the yychar
translation afterwards.  However, other skeletons appear to be
fine.  glr.c appears to already translate yychar before every
use.  lalr1.cc does not define yychar and does not document its
replacement, yyla, for users.  In lalr1.java, yychar is out of
scope during semantic actions.
* NEWS (2.5): Document.
* data/yacc.c (YYBACKUP): Don't bother translating yychar into
yytoken here.
(yyparse, yypush_parse): Instead, translate before every use of
yytoken, and add comments explaining this approach.
* tests/actions.at (Destroying lookahead assigned by semantic
action): New test group checking that translation happens before
lookahead destructor calls at parser return.  Previously,
incorrect destructors were called.
* tests/conflicts.at (parse.error=verbose and consistent
errors): New test group checking that translation happens at
syntax error detection before the associated verbose error
message and the associated lookahead destructor calls.  While
the destructor call is fixed by this patch, the verbose error
message is currently incorrect due to another bug (see
comments in test group), so this is an expected failure for now.
---
 ChangeLog          |   26 +++
 NEWS               |    7 +
 data/yacc.c        |   25 +++-
 src/parse-gram.c   |  522 +++++++++++++++++++++++-----------------------------
 src/parse-gram.h   |   68 +------
 tests/actions.at   |   71 +++++++
 tests/conflicts.at |  115 ++++++++++++
 7 files changed, 480 insertions(+), 354 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 2817f11..cd8016a 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,31 @@
 2009-12-16  Joel E. Denny  <address@hidden>
 
+       Fix handling of yychar manipulation in user semantic actions.
+       The problem was that yacc.c didn't always update the yychar
+       translation afterwards.  However, other skeletons appear to be
+       fine.  glr.c appears to already translate yychar before every
+       use.  lalr1.cc does not define yychar and does not document its
+       replacement, yyla, for users.  In lalr1.java, yychar is out of
+       scope during semantic actions.
+       * NEWS (2.5): Document.
+       * data/yacc.c (YYBACKUP): Don't bother translating yychar into
+       yytoken here.
+       (yyparse, yypush_parse): Instead, translate before every use of
+       yytoken, and add comments explaining this approach.
+       * tests/actions.at (Destroying lookahead assigned by semantic
+       action): New test group checking that translation happens before
+       lookahead destructor calls at parser return.  Previously,
+       incorrect destructors were called.
+       * tests/conflicts.at (parse.error=verbose and consistent
+       errors): New test group checking that translation happens at
+       syntax error detection before the associated verbose error
+       message and the associated lookahead destructor calls.  While
+       the destructor call is fixed by this patch, the verbose error
+       message is currently incorrect due to another bug (see
+       comments in test group), so this is an expected failure for now.
+
+2009-12-16  Joel E. Denny  <address@hidden>
+
        Add gcc's -Wundef to test suite and fix another warning from it.
        * NEWS (2.4.2): Update description of -Wundef fix.
        * configure.ac (WARN_CXXFLAGS_TEST): New substitution.
diff --git a/NEWS b/NEWS
index 92567dd..8aaf9e0 100644
--- a/NEWS
+++ b/NEWS
@@ -212,6 +212,13 @@ Bison News
   were resolved with %nonassoc.  Such tokens are now properly omitted
   from the list.
 
+** Destructor calls fixed for lookaheads altered in semantic actions.
+
+  Previously for deterministic parsers in C, if a user semantic action
+  altered yychar, the parser in some cases used the old yychar value to
+  determine which destructor to call for the lookahead upon a syntax
+  error or upon parser return.  This bug has been fixed.
+
 * Changes in version 2.4.2 (????-??-??):
 
 ** Detection of GNU M4 1.4.6 or newer during configure is improved.
diff --git a/data/yacc.c b/data/yacc.c
index 11ebbf1..dce9a6a 100644
--- a/data/yacc.c
+++ b/data/yacc.c
@@ -621,7 +621,6 @@ do                                                          
\
     {                                                          \
       yychar = (Token);                                                \
       yylval = (Value);                                                \
-      yytoken = YYTRANSLATE (yychar);                          \
       YYPOPSTACK (1);                                          \
       goto yybackup;                                           \
     }                                                          \
@@ -1420,6 +1419,17 @@ yyreduce:
       ]b4_user_actions[
       default: break;
     }
+  /* User semantic actions sometimes alter yychar, and that requires
+     that yytoken be updated with the new translation.  We take the
+     approach of translating immediately before every use of yytoken.
+     One alternative is translating here after every semantic action,
+     but that translation would be missed if the semantic action invokes
+     YYABORT, YYACCEPT, or YYERROR immediately after altering yychar or
+     if it invokes YYBACKUP.  In the case of YYABORT or YYACCEPT, an
+     incorrect destructor might then be invoked immediately.  In the
+     case of YYERROR or YYBACKUP, subsequent parser actions might lead
+     to an incorrect destructor call or verbose syntax error message
+     before the lookahead is translated.  */
   YY_SYMBOL_PRINT ("-> $$ =", yyr1[yyn], &yyval, &yyloc);
 
   YYPOPSTACK (yylen);
@@ -1448,6 +1458,10 @@ yyreduce:
 | yyerrlab -- here on detecting error |
 `------------------------------------*/
 yyerrlab:
+  /* Make sure we have latest lookahead translation.  See comments at
+     user semantic actions for why this is necessary.  */
+  yytoken = YYTRANSLATE (yychar);
+
   /* If not already recovering from an error, report this error.  */
   if (!yyerrstatus)
     {
@@ -1600,8 +1614,13 @@ yyexhaustedlab:
 
 yyreturn:
   if (yychar != YYEMPTY)
-     yydestruct ("Cleanup: discarding lookahead",
-                yytoken, &yylval]b4_locations_if([, &yylloc])[]b4_user_args[);
+    {
+      /* Make sure we have latest lookahead translation.  See comments at
+         user semantic actions for why this is necessary.  */
+      yytoken = YYTRANSLATE (yychar);
+      yydestruct ("Cleanup: discarding lookahead",
+                  yytoken, &yylval]b4_locations_if([, 
&yylloc])[]b4_user_args[);
+    }
   /* Do not reclaim the symbols of the rule which action triggered
      this YYABORT or YYACCEPT.  */
   YYPOPSTACK (yylen);
diff --git a/tests/actions.at b/tests/actions.at
index d317920..144791c 100644
--- a/tests/actions.at
+++ b/tests/actions.at
@@ -1408,3 +1408,74 @@ AT_MATCHES_CHECK([input.c], [[// TEST:Y:1 
[;{}]*\n;\}$]],  [[12]])
 AT_MATCHES_CHECK([input.c], [[#define TEST_MACRO_N \\\n\[\]"broken\\" \$ \@ 
\$\$ address@hidden \[\];\\\nstring;"\}]], [[2]])
 
 AT_CLEANUP
+
+
+## -------------------------------------------------- ##
+## Destroying lookahead assigned by semantic action.  ##
+## -------------------------------------------------- ##
+
+AT_SETUP([[Destroying lookahead assigned by semantic action]])
+
+AT_DATA_GRAMMAR([input.y],
+[[
+%code {
+  #include <assert.h>
+  #include <stdio.h>
+  static void yyerror (char const *);
+  static int yylex (void);
+  #define USE(Var)
+}
+
+%destructor { fprintf (stderr, "'a' destructor\n"); } 'a'
+%destructor { fprintf (stderr, "'b' destructor\n"); } 'b'
+
+%%
+
+// In a previous version of Bison, yychar assigned by the semantic
+// action below was not translated into yytoken before the lookahead was
+// discarded and thus before its destructor (selected according to
+// yytoken) was called in order to return from yyparse.  This would
+// happen even if YYACCEPT was performed in a later semantic action as
+// long as only consistent states with default reductions were visited
+// in between.  However, we leave YYACCEPT in the same semantic action
+// for this test in order to show that skeletons cannot simply translate
+// immediately after every semantic action because a semantic action
+// that has set yychar might not always return normally.  Instead,
+// skeletons must translate before every use of yytoken.
+start: 'a' accept { USE($1); } ;
+accept: /*empty*/ {
+  assert (yychar == YYEMPTY);
+  yychar = 'b';
+  YYACCEPT;
+} ;
+
+%%
+
+static void
+yyerror (char const *msg)
+{
+  fprintf (stderr, "%s\n", msg);
+}
+
+static int
+yylex (void)
+{
+  static char const *input = "a";
+  return *input++;
+}
+
+int
+main (void)
+{
+  return yyparse ();
+}
+]])
+
+AT_BISON_CHECK([[-o input.c input.y]])
+AT_COMPILE([[input]])
+AT_PARSER_CHECK([[./input]], [[0]], [],
+[['b' destructor
+'a' destructor
+]])
+
+AT_CLEANUP
diff --git a/tests/conflicts.at b/tests/conflicts.at
index 26ec08d..11eacd4 100644
--- a/tests/conflicts.at
+++ b/tests/conflicts.at
@@ -139,6 +139,121 @@ AT_CLEANUP
 
 
 
+## ------------------------------------------- ##
+## parse.error=verbose and consistent errors.  ##
+## ------------------------------------------- ##
+
+AT_SETUP([[parse.error=verbose and consistent errors]])
+
+m4_pushdef([AT_CONSISTENT_ERRORS_CHECK], [
+
+AT_BISON_CHECK([$1[ -o input.c input.y]])
+AT_COMPILE([[input]])
+
+m4_pushdef([AT_EXPECTING], [m4_if($2, [ab], [[, expecting 'a' or 'b']],
+                                  $2, [a],  [[, expecting 'a']],
+                                  $2, [b],  [[, expecting 'b']])])
+
+AT_PARSER_CHECK([[./input]], [[1]], [],
+[[syntax error, unexpected $end]AT_EXPECTING[
+]])
+
+m4_popdef([AT_EXPECTING])
+
+])
+
+AT_DATA_GRAMMAR([input.y],
+[[%code {
+  #include <assert.h>
+  #include <stdio.h>
+  int yylex (void);
+  void yyerror (char const *);
+  #define USE(Var)
+}
+
+%define parse.error verbose
+
+// The point isn't to test IELR here, but state merging happens to
+// complicate the example.
+%define lr.type ielr
+
+%nonassoc 'a'
+
+// If yylval=0 here, then we know that the 'a' destructor is being
+// invoked incorrectly for the $end set in the semantic action below.
+// All other tokens are returned by yylex, which sets yylval=1.
+%destructor {
+  if (!$$)
+    fprintf (stderr, "Wrong destructor.\n");
+} 'a'
+
+%%
+
+// The lookahead isn't needed before the error action is encountered.
+// In a previous version of Bison, this was a problem as it meant yychar
+// was not translated into yytoken before error recovery, which discards
+// the lookahead and thus invokes a destructor that it selected
+// according to the incorrect yytoken.  That missing translation also
+// would have caused the wrong unexpected token to be reported except
+// that, due to another bug, the unexpected token is not reported at all
+// because the error is detected in a consistent state with an error
+// action.  That bug still needs to be fixed.
+start: consistent-error-on-a-a 'a' { USE ($2); } ;
+
+consistent-error-on-a-a:
+    'a' consistent-reduction
+  | 'a' consistent-reduction 'a' { USE (($1, $3)); }
+  ;
+
+consistent-reduction: /*empty*/ {
+  assert (yychar == YYEMPTY);
+  yylval = 0;
+  yychar = 0;
+} ;
+
+// Provide another context in which all rules are useful so that this
+// test case looks a little more realistic.
+start: 'b' consistent-error-on-a-a 'b' ;
+
+%%
+
+int
+yylex (void)
+{
+  static char const *input = "a";
+  yylval = 1;
+  return *input++;
+}
+
+void
+yyerror (char const *msg)
+{
+  fprintf (stderr, "%s\n", msg);
+}
+
+int
+main (void)
+{
+  return yyparse ();
+}
+]])
+
+# See comments in grammar for why this test doesn't succeed.
+AT_XFAIL_IF([[:]])
+
+AT_CONSISTENT_ERRORS_CHECK([], [[none]])
+AT_CONSISTENT_ERRORS_CHECK([[-Dlr.default-reductions=consistent]], [[none]])
+
+# Canonical LR doesn't foresee the error for 'a'!
+AT_CONSISTENT_ERRORS_CHECK([[-Dlr.default-reductions=accepting]], [[a]])
+AT_CONSISTENT_ERRORS_CHECK([[-Flr.type=canonical-lr]], [[a]])
+
+m4_popdef([AT_CONSISTENT_ERRORS_CHECK])
+
+AT_CLEANUP
+
+
+
 ## ------------------------- ##
 ## Unresolved SR Conflicts.  ##
 ## ------------------------- ##
-- 
1.5.4.3





reply via email to

[Prev in Thread] Current Thread [Next in Thread]