Re: Conflict counterexamples

bison-patches
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Conflict counterexamples

From:	Akim Demaille
Subject:	Re: Conflict counterexamples
Date:	Sat, 6 Jun 2020 08:47:00 +0200
Hi Adrian,

Thanks a lot for the feedback!

> Le 5 juin 2020 à 16:55, Adrian Vogelsgesang <avogelsgesang@tableau.com> a 
> écrit :
> 
> Hi Akim, hi Vincent
> 
>> 1. Where
> I would definitely prefer to have it on the command line instead of the 
> .output file. 
> 
> First, our CI and our build scripts at the project I am working on don't keep 
> the .output file around, but they report stderr/stdout. We could of course 
> adjust our build scripts, but stderr would make my life easier.

I don't see the role played by the CI here.  Counterexample generation is 
useless unless you are working on the grammar itself.  The CI should not play 
any role here, rather it should monitor %expect etc. to make sure no-one 
creates regressions.  To me, expecting counterexamples from the CI is 
comparable to using the CI to see if a C program compiles.

Studying the counterexamples is something that should happen when you are 
preparing your commit, not when you push it.


> Furthermore for myself when debugging a grammar having to open the .output 
> file is always an extra step. I would prefer not to have to do that extra 
> step.

I understand that, but the output file contains very valuable information that 
the command line cannot give you.  Besides, the HTML output (e.g., 
https://www.lrde.epita.fr/~akim/private/bison/parse-gram.html) makes it way 
easier to navigate through states, rules and symbols, thanks to hyperlinks.  
Something the terminal cannot give you.

> And last, but not least: Afaict, bison prints errors in a gcc-compatible 
> format. Many IDEs support directly highlighting the parts of your code which 
> correspond to a warning produced by gcc. It would be amazing if I could see 
> the errors, including counterexamples directly in vim or VisualStudio Code - 
> however I never tried and I am not sure if it would actually work

Where I truly value the diagnostics on stderr, with compatibility with GCC 
etc., is for things which are related to the specific parts of the input file.  
On this regards, conflicts and counterexamples are on a completely different 
level: they are related to the automaton, its states and transitions.  They 
don't have a location in the source file.

That's exactly where the reports (*.output or *.html) shine: they show the 
automaton, something the grammar cannot give you.

I have not practiced -Wcounterexample in real life while working on a grammar, 
so I can certainly be biased; but so far, the reports are the single most 
valuable source of information to understand your parser, and debug/tune it.  
That's why I would expect anyone willing to debug her parser to look at the 
reports, and to expect the counterexamples to be there.

In my running example:

$ cat /tmp/foo.y
%%
exp
: "if" exp "then" exp
| "if" exp "then" exp "else" exp
| "num"

since the S/R conflict is in state 7, I would like the report to include facts 
about the conflict, e.g.:

State 7

    1 exp: "if" exp "then" exp .  [$end, "then", "else"]
    2    | "if" exp "then" exp . "else" exp

    "else"  shift, and go to state 8

    "else"    [reduce using rule 1 (exp)]
    $default  reduce using rule 1 (exp)

    Shift-reduce conflict on symbol "else"
    Example            "if"  exp  "then"  "if"  exp  "then"  exp  •  "else"  exp
    First  derivation  exp ::=[ "if"  exp  "then"  exp ::=[ "if"  exp  "then"  
exp  • ]  "else"  exp ]
    Second derivation  exp ::=[ "if"  exp  "then"  exp ::=[ "if"  exp  "then"  
exp  •  "else"  exp ] ]



>> 3. How
> Preferably in a format such that vim/VisualStudio Code can pick it up and 
> render it next to the corresponding lines.

I don't see what correspondance you mean.  Conflicts don't have a source 
location.

It could make sense with one of my proposals showing one derivation at a time, 
then we could relate to the source lines of each of the rules that participates.

It would also make sense to show the path through the automaton states in the 
report files.

> Among the two styles which you shared, I personally prefer the first one,
> i.e.
> [ "if"  exp  "then"  exp ::=[ "if"  exp  "then"  exp  • ]  "else"  exp ]
> vs.
> ["if"  exp  "then"  exp ::=[ "if"  exp  "then"  exp  •  "else"  exp ] ]
> since that makes the ambiguity more obvious to me.

This is super frustrating...  The mailing list removed everything from my 
message:
- it removed the colors
- it removed all the attachments

So, sure, I understand what you mean, since you couldn't actually see what I 
meant :(

See

1. The diagnostics with colors
   https://www.lrde.epita.fr/~akim/private/bison/bison-cex-color.png

2. One way to draw the derivations
   https://www.lrde.epita.fr/~akim/private/bison/deriv.pdf

3. Another one
   https://www.lrde.epita.fr/~akim/private/bison/deriv-2.pdf

These graphs can certainly be rendered more nicely if we spend time tuning 
graphviz.  We can look for means to embed them in the HTML output (well, very 
not straightforward, but doable).  Colors might be useful.

As another example, here's the case of the AWK grammar with colored examples 
(different rendering from the toy example):

https://www.lrde.epita.fr/~akim/private/bison/awk.html

I'll experiment with coloring the derivations too, with the same colors as the 
corresponding example.

> Those are the most valuable two lines to me and they should be right next to 
> each other. By comparing those two lines, the ambiguity becomes obvious.

In simple cases it is straightforward, agreed.  But have a look at the AWK 
case: in more complex cases, the rewritings are too much visual noise to my 
eyes, and I almost need to draw the derivation tree to see what actually 
happens.


> In your example, it took me some time to actually spot them. I am not sure 
> what all those other lines actually tell me, I would prefer to reduce the 
> verbosity a bit.

First legibility, then conciseness.

> To make it even simpler to spot the ambiguities, one could align the two 
> derivation under each over horizontally, such that the different position of 
> the "]" becomes more obvious.

I'm not sure I see what you mean here.  The derivations may go through very 
different symbols, so except for the root, I don't see what could be aligned.  
Aligning with the symbols of the counterexample makes a lot of sense though.  
But we might go through intermediate symbols with long names which would 
require a lot of horizontal space.


> The 2nd example makes it easier to understand the "state of the parse stack" 
> than the first example.
> E.g., it was easier to me to understand which of the derivations corresponds 
> to a reduce ("finish this level and go back up"), and which one corresponds 
> to a shift ("stay on the same level").
> The ambiguity on the other hand is less obvious since it only becomes 
> apparent after mentally combining multiple lines.

I had the opposite problem :)  I needed to unfold the lines to see the trees. 
Vincent's rendering of derivation trees is new to me, I never used/saw that 
before.  I'm used to the classical derivation trees, or leftmost derivation 
sequences.  And I believe that's the common knowledge.  Vincent's approach is 
super nice to fit all the needed data in a one liner, but the mixture is hard 
to read is rich cases.


> Usually I care more about the "what is ambiguous" than about "is it a reduce 
> or a shift".
> That's because to fix my grammar I have to understand "why is it ambiguous?" 
> first.

Actually we're not just talking about ambiguities, but of conflicts.

> Only after that I can consider my options ("introduce a new keyword", 
> "restructure my language", "use %prec to resolve the conflict", "change 
> operator precedence", ...).
> Personally, I usually settle for one of the first two options, i.e. changing 
> the language. And for those options I never actually care "which one is the 
> shift, which one is the reduce".

Sorry, I don't see what you mean.  I have shown other ways to display the 
derivations, and nothing about the LR automaton.  None of my proposals were 
about shifts and reduces, so I don't understand what you are referring to.

> The distinction "which one is the reduce, which one is the shift" is really 
> only necessary if I want to resolve the conflict by annotating my grammar 
> with corresponding precedence rules.

Or massaging the grammar.

> I am currently wondering, if we might want to simplify this use case, too, 
> and add hints like "to choose derivation one, add %prec ..." to the error 
> message

Wow...  That's super ambitious.  You'd have to foretell the impact of your 
directive, which basically boils down to running the automaton construction 
again.

We can add recipes in the documentation though.

Cheers!
[Prev in Thread]
Current Thread
[Next in Thread]
Re: Conflict counterexamples, Adrian Vogelsgesang, 2020/06/05
- Re: Conflict counterexamples, Akim Demaille <=
  - Re: Conflict counterexamples, Adrian Vogelsgesang, 2020/06/06
    - Re: Conflict counterexamples, Akim Demaille, 2020/06/07
Prev by Date: Re: Conflict counterexamples
Next by Date: Re: Conflict counterexamples
Previous by thread: Re: Conflict counterexamples
Next by thread: Re: Conflict counterexamples
Index(es):
- Date
- Thread