[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] doc: cex documentation
From: |
Akim Demaille |
Subject: |
Re: [PATCH] doc: cex documentation |
Date: |
Tue, 30 Jun 2020 06:47:52 +0200 |
Hi Vincent,
> Le 28 juin 2020 à 21:30, Vincent Imbimbo <vmi6@cornell.edu> a écrit :
>
> Hey Akim,
> I think I found a good place for the cex documentation in bison.texi, but it
> might still be a bit awkward. Tell me if you have any ideas for changes.
Thanks a lot for this! I made a few changes (e.g., the name is
-Wcounterexamples, plural, moving -Wcex as foremost feature of 3.7, wrap at 76
columns, remove the duplicate example in NEWS which is B&W, sort references,
etc.)
Here's how I installed your commit. I'll proceed from there. Thanks!
commit ecd731a84e8b75937bb99d5a6d4f20cb9fecf4b3
Author: Vincent Imbimbo <vmi6@cornell.edu>
Date: Sun Jun 28 15:30:17 2020 -0400
doc: cex documentation
* NEWS, doc/bison.texi: Add documentation for conflict counterexample
generation.
diff --git a/NEWS b/NEWS
index 9e1c5076..b3a5d1ea 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,30 @@ GNU Bison NEWS
** New features
+*** Counterexample Generation
+
+ When given `--report=counterexamples` or `-Wcounterexamples`, bison will
+ now output counterexamples for conflicts in the grammar. These are
+ strings in the grammar which can be parsed in two ways due to the
+ conflict. For example:
+
+ Example exp '+' exp • '/' exp
+ First derivation exp ::=[ exp ::=[ exp '+' exp • ] '/' exp ]
+ Second derivation exp ::=[ exp '+' exp ::=[ exp • '/' exp ] ]
+
+ This is a shift/reduce conflict caused by none of the operators having
+ precedence, so the example can be parsed in the two ways shown. When
+ bison cannot find an example that can be derived in two ways, it instead
+ generates two examples that are the same up until the dot:
+
+ First example expr • ID $end
+ First derivation $accept ::=[ s ::=[ a ::=[ expr • ] ID ] $end ]
+ Second example expr • ID ',' ID $end
+ Second derivation $accept ::=[ s ::=[ a ::=[ expr ::=[ expr • ID ',' ]
] ID ] $end ]
+
+ In these cases, the parser usually doesn't have enough lookahead to
+ differentiate the two given examples.
+
*** File prefix mapping
Bison learned a new argument, `--file-prefix-map OLD=NEW`. Any file path
diff --git a/doc/bison.texi b/doc/bison.texi
index 320a645b..e7225306 100644
--- a/doc/bison.texi
+++ b/doc/bison.texi
@@ -9797,12 +9797,69 @@ calc.y:19.1-7: @dwarning{warning}: nonterminal useless
in grammar: useless [@dwa
19 | @dwarning{useless: STR;}
| @dwarning{^~~~~~~}
calc.y: @dwarning{warning}: 7 shift/reduce conflicts
[@dwarning{-Wconflicts-sr}]
+calc.y: @dwarning{warning}: rerun with option '-Wcounterexamples' to generate
conflict counterexamples [@dwarning{-Wother}]
@end example
-When given @option{--report=state}, in addition to @file{calc.tab.c}, it
-creates a file @file{calc.output} with contents detailed below. The
-order of the output and the exact presentation might vary, but the
-interpretation is the same.
+When given @option{-Wcounterexamples}, @command{bison} will run a search for
+strings in your grammar that better demonstrate you
+conflicts. Counterexample generation was initially developed by Chinawat
+Isradisaikul and Andrew Myers (@pxref{Bibliography}). For @file{calc.y},
+the first printed example is:
+
+@example
+Shift/reduce conflict on token '/':
+ Example exp '+' exp • '/' exp
+ First derivation exp ::=[ exp ::=[ exp '+' exp • ] '/' exp ]
+ Example exp '+' exp • '/' exp
+ Second derivation exp ::=[ exp '+' exp ::=[ exp • '/' exp ] ]
+@end example
+
+This shows two separate derivations in the grammar for the same @code{exp}:
+@samp{e1 + e2 / e3}. The derivations show how your rules would parse the
+given example. Here, the first derivation completes a reduction when seeing
+@samp{/}, causing @samp{e1 + e2} to be grouped as an @code{exp}. The second
+derivation shifts on @samp{/}, resulting in @samp{e2 / e3} being grouped as
+an @code{exp}. Therefore, it is easy to see that adding @code{%precedence}
+directives would fix this conflict.
+
+Sometimes, the search will not find an example that can be derived in two
+ways. In these cases, counterexample generation will provide two examples
+that are the same up until the dot. Most notably, this will happen when
+your grammar requires a stronger parser (more lookahead, LR instead of
+LALR). The following example isn't LR(1):
+
+@example
+%token ID
+%%
+s: a ID
+a: expr
+expr: %empty | expr ID ','
+@end example
+
+@command{bison} reports:
+
+@example
+Shift/reduce conflict on token ID:
+ First example expr • ID $end
+ First derivation $accept ::=[ s ::=[ a ::=[ expr • ] ID ] $end ]
+ Second example expr • ID ',' ID $end
+ Second derivation $accept ::=[ s ::=[ a ::=[ expr ::=[ expr • ID ',' ] ]
ID ] $end ]
+@end example
+
+This conflict is caused by the parser not having enough information to know
+the difference between these two examples. The parser would need an
+additional lookahead token to know whether or not a comma follows the
+@code{ID} after @code{expr}. These types of conflicts tend to be more
+difficult to fix, and usually need a rework of the grammar. In this case,
+it can be fixed by changing around the recursion: @code{expr: ID | ',' expr
+ID}.
+
+Counterexamples can also be written to a file with @option{--report=cex}.
+
+Going back to the calc example, when given @option{--report=state},
+in addition to @file{calc.tab.c}, it creates a file @file{calc.output}
+with contents detailed below. The order of the output and the exact
+presentation might vary, but the interpretation is the same.
@noindent
@cindex token, useless
@@ -15211,6 +15268,7 @@ resolution. @xref{Unreachable States}.
@unnumbered Bibliography
@c Please follow the following canvas to add more references.
+@c And keep sorted alphabetically.
@table @asis
@item [Corbett 1984]
@@ -15252,6 +15310,14 @@ Look-Ahead Sets, in @cite{ACM Transactions on
Programming Languages and
Systems}, Vol.@: 4, No.@: 4 (October 1982), pp.@:
615--649. @uref{http://dx.doi.org/10.1145/69622.357187}
+@item [Isradisaikul, Myers 2015]
+Chinawat Isradisaikul, Andrew Myers,
+Finding Counterexamples from Parsing Conflicts,
+in @cite{Proceedings of the 36th ACM SIGPLAN Conference on
+Programming Language Design and Implementation} (PLDI '15),
+ACM, pp.@: 555--564.
+@uref{https://www.cs.cornell.edu/andru/papers/cupex/cupex.pdf}
+
@item [Johnson 1978]
Steven C. Johnson,
A portable compiler: theory and practice,
@@ -15345,17 +15411,18 @@ London, Department of Computer Science, TR-00-12
(December 2000).
@c LocalWords: typename emplace Wconversion Wshorten yacchack reentrancy ou
@c LocalWords: Relocatability exprs fixit Wyacc parseable fixits ffixit svg
@c LocalWords: DNDEBUG cstring Wzero workalike POPL workalikes byacc UCB
-@c LocalWords: Penello's Penello Byson Byson's Corbett's CSD TOPLAS PDP
+@c LocalWords: Penello's Penello Byson Byson's Corbett's CSD TOPLAS PDP cex
@c LocalWords: Beazley's goyacc ocamlyacc SIGACT SIGPLAN colorWarning exVal
@c LocalWords: setcolor rgbError colorError rgbNotice colorNotice derror
@c LocalWords: colorOff maincolor inlineraw darkviolet darkcyan dwarning
@c LocalWords: dnotice copyable stdint ptrdiff bufsize yyreport invariants
-@c LocalWords: xrefautomaticsectiontitle yysyntax yysymbol ARGMAX cond
+@c LocalWords: xrefautomaticsectiontitle yysyntax yysymbol ARGMAX cond RTTI
@c LocalWords: Wdangling yytoken erreur syntaxe inattendu attendait nombre
@c LocalWords: YYUNDEF SymbolKind yypcontext YYENOMEM TOKENMAX getBundle
-@c LocalWords: ResourceBundle myResources getString getName getToken
+@c LocalWords: ResourceBundle myResources getString getName getToken ylwrap
@c LocalWords: getLocation getExpectedTokens reportSyntaxError bistromathic
-@c LocalWords: TokenKind
+@c LocalWords: TokenKind Automake's rtti Wcounterexamples Chinawat PLDI
+@c LocalWords: Isradisaikul
@c Local Variables:
@c ispell-dictionary: "american"