bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Fix handling of yychar manipulation in user semantic actions


From: Joel E. Denny
Subject: Re: [PATCH] Fix handling of yychar manipulation in user semantic actions.
Date: Sun, 20 Dec 2009 03:17:59 -0500 (EST)
User-agent: Alpine 1.00 (DEB 882 2007-12-20)

On Wed, 16 Dec 2009, Joel E. Denny wrote:

> A few months ago, I started talking about a lookahead correction mechanism 
> I was working on for Bison.  Its most obvious benefit is to correct the 
> list of expected tokens in verbose syntax error messages.  I'm finally 
> getting back to pushing that patch series through.
> 
> The following patch is next.  The problem it addresses is that users 
> sometimes write semantic actions that manipulate yychar.  Of course, they 
> can do this with YYBACKUP or yyclearin.  However, my impression is that 
> many users alter yychar directly.  In this case, the translation in 
> yytoken isn't always updated, and bad destructor calls or syntax error 
> messages can result.  This patch fixes that.
> 
> I'd like to push this patch to master and branch-2.5.  Any feedback would 
> be appreciated.  I'd especially like to hear comments on my analysis of 
> glr.c, lalr1.cc, and lalr1.java.
> 
> Don't worry about the unfixed bug I mention in the test suite.  The next 
> patch will more thoroughly explain it and fix it.
> 
> >From cfdf80e9e688706a449dc68662e5e4ad0674ab01 Mon Sep 17 00:00:00 2001
> From: Joel E. Denny <address@hidden>
> Date: Wed, 16 Dec 2009 20:55:40 -0500
> Subject: [PATCH] Fix handling of yychar manipulation in user semantic actions.

I botched the last test group.  It exercised the lookahead destructor at 
parser return rather than the one during syntax error recovery.  Here's a 
revised patch.  I also extended the ChangeLog's discussion of lalr1.cc a 
little.

>From 575e4443e4993046e0b812ea1ef5ef3cbd4480a0 Mon Sep 17 00:00:00 2001
From: Joel E. Denny <address@hidden>
Date: Sun, 20 Dec 2009 02:22:53 -0500
Subject: [PATCH] Fix handling of yychar manipulation in user semantic actions.

The problem was that yacc.c didn't always update the yychar
translation afterwards.  However, other skeletons appear to be
fine.  glr.c appears to already translate yychar before every
use.  lalr1.cc does not define yychar and does not document its
replacement, yyla, for users.  It does provide yyclearin, but
that does not manipulate yyla and thus requires no translation
update.  In lalr1.java, yychar is out of scope during semantic
actions.
* NEWS (2.5): Document.
* data/yacc.c (YYBACKUP): Don't bother translating yychar into
yytoken here.
(yyparse, yypush_parse): Instead, translate before every use of
yytoken, and add comments explaining this approach.
* tests/actions.at (Destroying lookahead assigned by semantic
action): New test group checking that translation happens before
lookahead destructor calls at parser return.  Previously,
incorrect destructors were called.
* tests/conflicts.at (parse.error=verbose and consistent
errors): New test group checking that translation happens at
syntax error detection before the associated verbose error
message and the associated lookahead destructor calls.  While
the destructor call is fixed by this patch, the verbose error
message is currently incorrect due to another bug (see
comments in test group), so this is an expected failure for now.
---
 ChangeLog          |   28 +++
 NEWS               |    7 +
 data/yacc.c        |   25 +++-
 src/parse-gram.c   |  522 +++++++++++++++++++++++-----------------------------
 src/parse-gram.h   |   68 +------
 tests/actions.at   |   71 +++++++
 tests/conflicts.at |  124 +++++++++++++
 7 files changed, 491 insertions(+), 354 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 1c15cc5..c53baca 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,31 @@
+2009-12-20  Joel E. Denny  <address@hidden>
+
+       Fix handling of yychar manipulation in user semantic actions.
+       The problem was that yacc.c didn't always update the yychar
+       translation afterwards.  However, other skeletons appear to be
+       fine.  glr.c appears to already translate yychar before every
+       use.  lalr1.cc does not define yychar and does not document its
+       replacement, yyla, for users.  It does provide yyclearin, but
+       that does not manipulate yyla and thus requires no translation
+       update.  In lalr1.java, yychar is out of scope during semantic
+       actions.
+       * NEWS (2.5): Document.
+       * data/yacc.c (YYBACKUP): Don't bother translating yychar into
+       yytoken here.
+       (yyparse, yypush_parse): Instead, translate before every use of
+       yytoken, and add comments explaining this approach.
+       * tests/actions.at (Destroying lookahead assigned by semantic
+       action): New test group checking that translation happens before
+       lookahead destructor calls at parser return.  Previously,
+       incorrect destructors were called.
+       * tests/conflicts.at (parse.error=verbose and consistent
+       errors): New test group checking that translation happens at
+       syntax error detection before the associated verbose error
+       message and the associated lookahead destructor calls.  While
+       the destructor call is fixed by this patch, the verbose error
+       message is currently incorrect due to another bug (see
+       comments in test group), so this is an expected failure for now.
+
 2009-12-18  Joel E. Denny  <address@hidden>
 
        lalr1.cc: don't discard non-existent lookahead on syntax error.
diff --git a/NEWS b/NEWS
index 92567dd..8aaf9e0 100644
--- a/NEWS
+++ b/NEWS
@@ -212,6 +212,13 @@ Bison News
   were resolved with %nonassoc.  Such tokens are now properly omitted
   from the list.
 
+** Destructor calls fixed for lookaheads altered in semantic actions.
+
+  Previously for deterministic parsers in C, if a user semantic action
+  altered yychar, the parser in some cases used the old yychar value to
+  determine which destructor to call for the lookahead upon a syntax
+  error or upon parser return.  This bug has been fixed.
+
 * Changes in version 2.4.2 (????-??-??):
 
 ** Detection of GNU M4 1.4.6 or newer during configure is improved.
diff --git a/data/yacc.c b/data/yacc.c
index 11ebbf1..dce9a6a 100644
--- a/data/yacc.c
+++ b/data/yacc.c
@@ -621,7 +621,6 @@ do                                                          
\
     {                                                          \
       yychar = (Token);                                                \
       yylval = (Value);                                                \
-      yytoken = YYTRANSLATE (yychar);                          \
       YYPOPSTACK (1);                                          \
       goto yybackup;                                           \
     }                                                          \
@@ -1420,6 +1419,17 @@ yyreduce:
       ]b4_user_actions[
       default: break;
     }
+  /* User semantic actions sometimes alter yychar, and that requires
+     that yytoken be updated with the new translation.  We take the
+     approach of translating immediately before every use of yytoken.
+     One alternative is translating here after every semantic action,
+     but that translation would be missed if the semantic action invokes
+     YYABORT, YYACCEPT, or YYERROR immediately after altering yychar or
+     if it invokes YYBACKUP.  In the case of YYABORT or YYACCEPT, an
+     incorrect destructor might then be invoked immediately.  In the
+     case of YYERROR or YYBACKUP, subsequent parser actions might lead
+     to an incorrect destructor call or verbose syntax error message
+     before the lookahead is translated.  */
   YY_SYMBOL_PRINT ("-> $$ =", yyr1[yyn], &yyval, &yyloc);
 
   YYPOPSTACK (yylen);
@@ -1448,6 +1458,10 @@ yyreduce:
 | yyerrlab -- here on detecting error |
 `------------------------------------*/
 yyerrlab:
+  /* Make sure we have latest lookahead translation.  See comments at
+     user semantic actions for why this is necessary.  */
+  yytoken = YYTRANSLATE (yychar);
+
   /* If not already recovering from an error, report this error.  */
   if (!yyerrstatus)
     {
@@ -1600,8 +1614,13 @@ yyexhaustedlab:
 
 yyreturn:
   if (yychar != YYEMPTY)
-     yydestruct ("Cleanup: discarding lookahead",
-                yytoken, &yylval]b4_locations_if([, &yylloc])[]b4_user_args[);
+    {
+      /* Make sure we have latest lookahead translation.  See comments at
+         user semantic actions for why this is necessary.  */
+      yytoken = YYTRANSLATE (yychar);
+      yydestruct ("Cleanup: discarding lookahead",
+                  yytoken, &yylval]b4_locations_if([, 
&yylloc])[]b4_user_args[);
+    }
   /* Do not reclaim the symbols of the rule which action triggered
      this YYABORT or YYACCEPT.  */
   YYPOPSTACK (yylen);
diff --git a/tests/actions.at b/tests/actions.at
index d317920..144791c 100644
--- a/tests/actions.at
+++ b/tests/actions.at
@@ -1408,3 +1408,74 @@ AT_MATCHES_CHECK([input.c], [[// TEST:Y:1 
[;{}]*\n;\}$]],  [[12]])
 AT_MATCHES_CHECK([input.c], [[#define TEST_MACRO_N \\\n\[\]"broken\\" \$ \@ 
\$\$ address@hidden \[\];\\\nstring;"\}]], [[2]])
 
 AT_CLEANUP
+
+
+## -------------------------------------------------- ##
+## Destroying lookahead assigned by semantic action.  ##
+## -------------------------------------------------- ##
+
+AT_SETUP([[Destroying lookahead assigned by semantic action]])
+
+AT_DATA_GRAMMAR([input.y],
+[[
+%code {
+  #include <assert.h>
+  #include <stdio.h>
+  static void yyerror (char const *);
+  static int yylex (void);
+  #define USE(Var)
+}
+
+%destructor { fprintf (stderr, "'a' destructor\n"); } 'a'
+%destructor { fprintf (stderr, "'b' destructor\n"); } 'b'
+
+%%
+
+// In a previous version of Bison, yychar assigned by the semantic
+// action below was not translated into yytoken before the lookahead was
+// discarded and thus before its destructor (selected according to
+// yytoken) was called in order to return from yyparse.  This would
+// happen even if YYACCEPT was performed in a later semantic action as
+// long as only consistent states with default reductions were visited
+// in between.  However, we leave YYACCEPT in the same semantic action
+// for this test in order to show that skeletons cannot simply translate
+// immediately after every semantic action because a semantic action
+// that has set yychar might not always return normally.  Instead,
+// skeletons must translate before every use of yytoken.
+start: 'a' accept { USE($1); } ;
+accept: /*empty*/ {
+  assert (yychar == YYEMPTY);
+  yychar = 'b';
+  YYACCEPT;
+} ;
+
+%%
+
+static void
+yyerror (char const *msg)
+{
+  fprintf (stderr, "%s\n", msg);
+}
+
+static int
+yylex (void)
+{
+  static char const *input = "a";
+  return *input++;
+}
+
+int
+main (void)
+{
+  return yyparse ();
+}
+]])
+
+AT_BISON_CHECK([[-o input.c input.y]])
+AT_COMPILE([[input]])
+AT_PARSER_CHECK([[./input]], [[0]], [],
+[['b' destructor
+'a' destructor
+]])
+
+AT_CLEANUP
diff --git a/tests/conflicts.at b/tests/conflicts.at
index 26ec08d..faaa3b9 100644
--- a/tests/conflicts.at
+++ b/tests/conflicts.at
@@ -139,6 +139,130 @@ AT_CLEANUP
 
 
 
+## ------------------------------------------- ##
+## parse.error=verbose and consistent errors.  ##
+## ------------------------------------------- ##
+
+AT_SETUP([[parse.error=verbose and consistent errors]])
+
+m4_pushdef([AT_CONSISTENT_ERRORS_CHECK], [
+
+AT_BISON_CHECK([$1[ -o input.c input.y]])
+AT_COMPILE([[input]])
+
+m4_pushdef([AT_EXPECTING], [m4_if($3, [ab], [[, expecting 'a' or 'b']],
+                                  $3, [a],  [[, expecting 'a']],
+                                  $3, [b],  [[, expecting 'b']])])
+
+AT_PARSER_CHECK([[./input]], [[1]], [],
+[[syntax error, unexpected ]$2[]AT_EXPECTING[
+]])
+
+m4_popdef([AT_EXPECTING])
+
+])
+
+AT_DATA_GRAMMAR([input.y],
+[[%code {
+  #include <assert.h>
+  #include <stdio.h>
+  int yylex (void);
+  void yyerror (char const *);
+  #define USE(Var)
+}
+
+%define parse.error verbose
+
+// The point isn't to test IELR here, but state merging happens to
+// complicate the example.
+%define lr.type ielr
+
+%nonassoc 'a'
+
+// If yylval=0 here, then we know that the 'a' destructor is being
+// invoked incorrectly for the 'b' set in the semantic action below.
+// All 'a' tokens are returned by yylex, which sets yylval=1.
+%destructor {
+  if (!$$)
+    fprintf (stderr, "Wrong destructor.\n");
+} 'a'
+
+%%
+
+// The lookahead assigned by the semantic action isn't needed before
+// either error action is encountered.  In a previous version of Bison,
+// this was a problem as it meant yychar was not translated into yytoken
+// before either error action.  The second error action thus invoked a
+// destructor that it selected according to the incorrect yytoken.  The
+// first error action would have reported an incorrect unexpected token
+// except that, due to another bug, the unexpected token is not reported
+// at all because the error action is the default action in a consistent
+// state.  That bug still needs to be fixed.
+start: error-reduce consistent-error 'a' { USE ($3); } ;
+
+error-reduce:
+  'a' 'a' consistent-reduction consistent-error 'a'
+  { USE (($1, $2, $5)); }
+| 'a' error
+  { USE ($1); }
+;
+
+consistent-reduction: /*empty*/ {
+  assert (yychar == YYEMPTY);
+  yylval = 0;
+  yychar = 'b';
+} ;
+
+consistent-error:
+  'a' { USE ($1); }
+| /*empty*/ %prec 'a'
+;
+
+// Provide another context in which all rules are useful so that this
+// test case looks a little more realistic.
+start: 'b' consistent-error 'b' ;
+
+%%
+
+int
+yylex (void)
+{
+  static char const *input = "aa";
+  yylval = 1;
+  return *input++;
+}
+
+void
+yyerror (char const *msg)
+{
+  fprintf (stderr, "%s\n", msg);
+}
+
+int
+main (void)
+{
+  return yyparse ();
+}
+]])
+
+# See comments in grammar for why this test doesn't succeed.
+AT_XFAIL_IF([[:]])
+
+AT_CONSISTENT_ERRORS_CHECK([], [['b']], [[none]])
+AT_CONSISTENT_ERRORS_CHECK([[-Dlr.default-reductions=consistent]],
+                           [['b']], [[none]])
+
+# Canonical LR doesn't foresee the error for 'a'!
+AT_CONSISTENT_ERRORS_CHECK([[-Dlr.default-reductions=accepting]],
+                           [[$end]], [[a]])
+AT_CONSISTENT_ERRORS_CHECK([[-Flr.type=canonical-lr]], [[$end]], [[a]])
+
+m4_popdef([AT_CONSISTENT_ERRORS_CHECK])
+
+AT_CLEANUP
+
+
+
 ## ------------------------- ##
 ## Unresolved SR Conflicts.  ##
 ## ------------------------- ##
-- 
1.5.4.3





reply via email to

[Prev in Thread] Current Thread [Next in Thread]