bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lalr1.cc: use %printer in the syntax error messages


From: Akim Demaille
Subject: lalr1.cc: use %printer in the syntax error messages
Date: Wed, 26 Aug 2009 14:14:19 +0200

I have pushed in candidates/semantic-error-messages changes that allow to have messages which display the semantic value of the "unexpected" token as follows (lines starting with a square bracket are answers, the other lines are the input):

1 2;
[00005388:error] !!! 1.3: syntax error, unexpected 2, expecting end of command
1 "2";
[00009774:error] !!! 2.3-5: syntax error, unexpected "2", expecting end of command

instead of previously:

1 1;
[00002152:error] !!! 1.3: syntax error, unexpected float, expecting end of command
1 "2";
[00007376:error] !!! 2.3-5: syntax error, unexpected string, expecting end of command


I did this to explore a bit the space we have for syntax error messages, in order to decide what %define variables we should use. I think that using the semantic values in the error messages is a nice feature, but there are several issues.

The first one is the availability of a usable %printer to print the semantic value. In C++ we have stringstreams that allow to reuse stream-oriented %printers into the construction of the error message we wish to pass to the user. I see no means to address this issue, C users cannot have the construction of the message for free.

In the candidate branch I have also allowed myself to detach the printer and destructor of the symbols from the parser and attached them to the symbols themselves. I feel rely culprit about this: when I introduced %printer and %destructor it seemed natural to pass the %parse-params to them, and in C++ this is simply achieved by making the %printer and %destructor members of the parser class. But this is nonsense in classical OO programming, and it prevents us from making truly useful destructors and operator<<. I'd like to use NEWS to query the users to tell us if some of them really use %parse-param in the %printer and %destructor, and to remove the "feature" in 2.6 if possible, at least in C++. Admittedly, this is somewhat unrelated to the error-message issues presented here.

Another issue is that I am not sure this is not a dead-end. Of course it's really nice to see the real culprit, but on the other hand, we don't really see it: we see the semantic value, which is not always what the user entered. For instance:

1 00000.02;
[00021410:error] !!! 3.3-10: syntax error, unexpected 0.02, expecting end of command
1 "\x40\x40";
[00051867:error] !!! 5.3-12: syntax error, unexpected "@@", expecting end of command


(Also, the fact that "string" or "float" is no longer display might sometimes hide a problem in the input that results in incorrect lexical category being selected without the user noticing it. For instance an identifier looks very much like a reserved keyword in most languages, and if you happen to use a keyword you were not aware of, such a error message will not emphasize it. This can be easily addressed by saying "unexpected string "@@"", but that proves that there is not a single choice for this).

So I guess what we would really like is to have a true copy of yytext stored in the lookahead symbol. Since that's the only one we really need, it should not be too expensive in space (actually, an access to yytext and yyleng should suffice, I don't think we need to strndup/ free it).

Also, I have always found that "caret-error messages" (I don't know how to name them) are really nice, and completely solve the issue. This is typed by hand to give the idea.

1 00000.02;
[00021410:error] !!! 3.3-10: 1 00000.02;
[00021410:error] !!! 3.3-10:   ^^^^^^^^^
[00021410:error] !!! 3.3-10: syntax error, unexpected float, expecting end of command

This would need a lot of help for the program itself: we cannot expect to keep the whole input in memory, we cannot expect the ast to keep an exact copy of the yytext of the terminals (not to mention the layout/ comments and so forth). But the program may be able to use the location to reopen the input file (if not stdin) and to look for the guilty line.

So...

So I'm lost. I see no pattern here. I fail to see how bison can possibly please the user with a complete set of Boolean features to customize the error messages.

So I think that what we really need is to open yysyntax_error to the user. We should provide the user with the lookahead, the (full) list of expected tokens, the location, and *she* should forge the error message she wants. That would be something like

%define parse.error.messages "custom"

in which case yysyntax_error (or whatever the name we chose) must be provided by the user.


(Oh, and by the way, the error messages presented here are completely wrong. I have no idea why they are, and I hoped that Joel's changes would solve them, but they do not: there are many many other tokens that can follow the initial float, arithmetical operators for instance.)



For the curious, here is how yysyntax_error_ is changed to use %printer in lalr1.cc:

         char const* yyformat = 0;
         switch (yycount)
         {
 #define YYCASE_(N, S)                           \
           case N:                               \
             yyformat = S;                       \
           break
           YYCASE_(1, YY_("syntax error, unexpected %s"));
YYCASE_(2, YY_("syntax error, unexpected %s, expecting %s")); YYCASE_(3, YY_("syntax error, unexpected %s, expecting %s or %s")); YYCASE_(4, YY_("syntax error, unexpected %s, expecting %s or %s or %s")); YYCASE_(5, YY_("syntax error, unexpected %s, expecting %s or %s or %s or %s"));
 #undef YYCASE_
         }
         // Argument number.
         size_t yyi = 0;
+        // The unexpected token.  Try to print its value.
+        std::ostringstream yyo;
+        yyla.print (yyo);
         for (char const* yyp = yyformat; *yyp; ++yyp)
           if (yyp[0] == '%' && yyp[1] == 's' && yyi < yycount)
           {
-            yyres += yytnamerr_ (yyarg[yyi++]);
+            if (!yyi && !yyo.str ().empty ())
+              yyres += yyo.str ();
+            else
+              yyres += yytnamerr_ (yyarg[yyi]);
+            ++yyi;
             ++yyp;
           }
           else
             yyres += *yyp;


I have added symbol::type_name (that uses yytname) to allow messages which also display the token type, the user can write:

%printer { debug_stream() << type_name() << " \"" << $$ << '"'; } <std::string>;






reply via email to

[Prev in Thread] Current Thread [Next in Thread]