[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Conflict counterexamples
From: |
Akim Demaille |
Subject: |
Re: Conflict counterexamples |
Date: |
Sat, 6 Jun 2020 08:47:00 +0200 |
Hi Adrian,
Thanks a lot for the feedback!
> Le 5 juin 2020 à 16:55, Adrian Vogelsgesang <avogelsgesang@tableau.com> a
> écrit :
>
> Hi Akim, hi Vincent
>
>> 1. Where
> I would definitely prefer to have it on the command line instead of the
> .output file.
>
> First, our CI and our build scripts at the project I am working on don't keep
> the .output file around, but they report stderr/stdout. We could of course
> adjust our build scripts, but stderr would make my life easier.
I don't see the role played by the CI here. Counterexample generation is
useless unless you are working on the grammar itself. The CI should not play
any role here, rather it should monitor %expect etc. to make sure no-one
creates regressions. To me, expecting counterexamples from the CI is
comparable to using the CI to see if a C program compiles.
Studying the counterexamples is something that should happen when you are
preparing your commit, not when you push it.
> Furthermore for myself when debugging a grammar having to open the .output
> file is always an extra step. I would prefer not to have to do that extra
> step.
I understand that, but the output file contains very valuable information that
the command line cannot give you. Besides, the HTML output (e.g.,
https://www.lrde.epita.fr/~akim/private/bison/parse-gram.html) makes it way
easier to navigate through states, rules and symbols, thanks to hyperlinks.
Something the terminal cannot give you.
> And last, but not least: Afaict, bison prints errors in a gcc-compatible
> format. Many IDEs support directly highlighting the parts of your code which
> correspond to a warning produced by gcc. It would be amazing if I could see
> the errors, including counterexamples directly in vim or VisualStudio Code -
> however I never tried and I am not sure if it would actually work
Where I truly value the diagnostics on stderr, with compatibility with GCC
etc., is for things which are related to the specific parts of the input file.
On this regards, conflicts and counterexamples are on a completely different
level: they are related to the automaton, its states and transitions. They
don't have a location in the source file.
That's exactly where the reports (*.output or *.html) shine: they show the
automaton, something the grammar cannot give you.
I have not practiced -Wcounterexample in real life while working on a grammar,
so I can certainly be biased; but so far, the reports are the single most
valuable source of information to understand your parser, and debug/tune it.
That's why I would expect anyone willing to debug her parser to look at the
reports, and to expect the counterexamples to be there.
In my running example:
$ cat /tmp/foo.y
%%
exp
: "if" exp "then" exp
| "if" exp "then" exp "else" exp
| "num"
since the S/R conflict is in state 7, I would like the report to include facts
about the conflict, e.g.:
State 7
1 exp: "if" exp "then" exp . [$end, "then", "else"]
2 | "if" exp "then" exp . "else" exp
"else" shift, and go to state 8
"else" [reduce using rule 1 (exp)]
$default reduce using rule 1 (exp)
Shift-reduce conflict on symbol "else"
Example "if" exp "then" "if" exp "then" exp • "else" exp
First derivation exp ::=[ "if" exp "then" exp ::=[ "if" exp "then"
exp • ] "else" exp ]
Second derivation exp ::=[ "if" exp "then" exp ::=[ "if" exp "then"
exp • "else" exp ] ]
>> 3. How
> Preferably in a format such that vim/VisualStudio Code can pick it up and
> render it next to the corresponding lines.
I don't see what correspondance you mean. Conflicts don't have a source
location.
It could make sense with one of my proposals showing one derivation at a time,
then we could relate to the source lines of each of the rules that participates.
It would also make sense to show the path through the automaton states in the
report files.
> Among the two styles which you shared, I personally prefer the first one,
> i.e.
> [ "if" exp "then" exp ::=[ "if" exp "then" exp • ] "else" exp ]
> vs.
> ["if" exp "then" exp ::=[ "if" exp "then" exp • "else" exp ] ]
> since that makes the ambiguity more obvious to me.
This is super frustrating... The mailing list removed everything from my
message:
- it removed the colors
- it removed all the attachments
So, sure, I understand what you mean, since you couldn't actually see what I
meant :(
See
1. The diagnostics with colors
https://www.lrde.epita.fr/~akim/private/bison/bison-cex-color.png
2. One way to draw the derivations
https://www.lrde.epita.fr/~akim/private/bison/deriv.pdf
3. Another one
https://www.lrde.epita.fr/~akim/private/bison/deriv-2.pdf
These graphs can certainly be rendered more nicely if we spend time tuning
graphviz. We can look for means to embed them in the HTML output (well, very
not straightforward, but doable). Colors might be useful.
As another example, here's the case of the AWK grammar with colored examples
(different rendering from the toy example):
https://www.lrde.epita.fr/~akim/private/bison/awk.html
I'll experiment with coloring the derivations too, with the same colors as the
corresponding example.
> Those are the most valuable two lines to me and they should be right next to
> each other. By comparing those two lines, the ambiguity becomes obvious.
In simple cases it is straightforward, agreed. But have a look at the AWK
case: in more complex cases, the rewritings are too much visual noise to my
eyes, and I almost need to draw the derivation tree to see what actually
happens.
> In your example, it took me some time to actually spot them. I am not sure
> what all those other lines actually tell me, I would prefer to reduce the
> verbosity a bit.
First legibility, then conciseness.
> To make it even simpler to spot the ambiguities, one could align the two
> derivation under each over horizontally, such that the different position of
> the "]" becomes more obvious.
I'm not sure I see what you mean here. The derivations may go through very
different symbols, so except for the root, I don't see what could be aligned.
Aligning with the symbols of the counterexample makes a lot of sense though.
But we might go through intermediate symbols with long names which would
require a lot of horizontal space.
> The 2nd example makes it easier to understand the "state of the parse stack"
> than the first example.
> E.g., it was easier to me to understand which of the derivations corresponds
> to a reduce ("finish this level and go back up"), and which one corresponds
> to a shift ("stay on the same level").
> The ambiguity on the other hand is less obvious since it only becomes
> apparent after mentally combining multiple lines.
I had the opposite problem :) I needed to unfold the lines to see the trees.
Vincent's rendering of derivation trees is new to me, I never used/saw that
before. I'm used to the classical derivation trees, or leftmost derivation
sequences. And I believe that's the common knowledge. Vincent's approach is
super nice to fit all the needed data in a one liner, but the mixture is hard
to read is rich cases.
> Usually I care more about the "what is ambiguous" than about "is it a reduce
> or a shift".
> That's because to fix my grammar I have to understand "why is it ambiguous?"
> first.
Actually we're not just talking about ambiguities, but of conflicts.
> Only after that I can consider my options ("introduce a new keyword",
> "restructure my language", "use %prec to resolve the conflict", "change
> operator precedence", ...).
> Personally, I usually settle for one of the first two options, i.e. changing
> the language. And for those options I never actually care "which one is the
> shift, which one is the reduce".
Sorry, I don't see what you mean. I have shown other ways to display the
derivations, and nothing about the LR automaton. None of my proposals were
about shifts and reduces, so I don't understand what you are referring to.
> The distinction "which one is the reduce, which one is the shift" is really
> only necessary if I want to resolve the conflict by annotating my grammar
> with corresponding precedence rules.
Or massaging the grammar.
> I am currently wondering, if we might want to simplify this use case, too,
> and add hints like "to choose derivation one, add %prec ..." to the error
> message
Wow... That's super ambitious. You'd have to foretell the impact of your
directive, which basically boils down to running the automaton construction
again.
We can add recipes in the documentation though.
Cheers!