Re: Proposed interface changes for "parse.error custom"

bison-patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposed interface changes for "parse.error custom"

From:	Akim Demaille
Subject:	Re: Proposed interface changes for "parse.error custom"
Date:	Thu, 27 Feb 2020 07:04:50 +0100

Hi Adrian,

Thanks a lot for https://github.com/akimd/bison/pull/28!

> Le 26 févr. 2020 à 23:58, Adrian Vogelsgesang <address@hidden> a écrit :
> 
> Hi Akim
> 
> 
> regarding our discussion on yysyntax_error_arguments
> --------------------------------------------------------------------------
> 
>> Why not, but where/when would you expect to use one and not the others?
> Use case for lookahead token without expected tokens:

I'm sorry, this sentence is a remain of a previous version of my answer, where 
I had misunderstood your point.  I had rewritten it, but forgot to remove it.

> Use case for expected tokens, but without the lookahead token:
> I am having a harder time to come up with a good use case here.

Actually, that's easy: that's a request from Christian Schoenebeck 
(https://lists.gnu.org/r/bison-patches/2020-01/msg00002.html), to be able to 
propose autocompletion features in incremental parsers (i.e., push parsers).


> 1. I would like to treat expected tokens and the lookahead value 
> independently.

Makes perfect sense.

> 2. As a new user of bison who wants to get the lookahead token, I would 
> probably not realize that `yysyntax_error_arguments` does what I am looking 
> for.

Here, you are getting close to my hidden agenda...

One secret motivation for not exposing the lookahead apart, is that I fear that 
we will get request to also be able to get the lookahead's value, so that one 
could report

syntax error: unexpected 42, expected + or -

instead of

syntax error: unexpected integer literal, expected + or -

There are several reasons I'm not fond of this approach.  Some are related to 
the feature itself:

- most of the time the lexeme (the sequence of chars the user typed that became 
a token) cannot be restored from its semantic value.  For instance 0101010, 42, 
0x2A and 0x2a all map to "42" (and possibly you would also have 0_10_10_10 etc. 
in some languages).  Or strings with escapes, etc.
- some tokens can be monsters, say long string literals, and you should 
certainly not quote them in the error message
- maybe precisely showing the token type helps understanding that you made a 
small discret typo that made a difference between say an int and a float.

Some are related to the implementation:

- formatting a semantic value requires to know its type and be able to 
manipulate it.  Unfortunately that would require either giving the end user an 
access to the internal symbol numbers (as opposed to the external token numbers 
that *are* exposed), or to activate yytoknum.
- of course %printers come to the rescue!  But because %printers are 
stream-based, they won't help in C to format a semantic value into a string, 
which eventually becomes a problem to support internationalization of error 
messages.

Many of these issues can be addressed if the scanner not only returns semantic 
values, but also lexemes.  But that's sooooo heavy...

I'm a strong believer that:
- the error message itself should be actually token types, not semantic values
- the context (such as the lexeme) is immensely useful, I certainly agree with 
that, but it is best exposed with underlined source quotation, as GCC, Clang 
and Bison do:

> $ bison foo.y
> foo.y:1.8-11: error: expected string or character literal or identifier 
> before integer literal
>     1 | %start 0x2a
>       |        ^~~~

So, all this to report that I was also putting the lookahead in the "set of 
tokens" to avoid drawing too much attention onto it.



> A function named `yyget_lookahead` is much more self-describing. Yes, bison 
> belongs to the category of programs which are hard to use without actually 
> reading the documentation and I am sure that we would document 
> `yysyntax_error_arguments` as the supported way to get the lookahead token. 
> However, the reviewers of my code changes to our parser, or other people 
> coming across my commits later, might not have read the bison docs and have a 
> hard time to understand what exactly `yysyntax_error_arguments` does.

I find this argument somewhat weak :)  Not reading the doc is not an excuse.


> 3. Naming this function `yysyntax_error_arguments` makes it harder to extend 
> the provided information in the future. We are introducing the `ParseContext` 
> concept to be able to potentially provide additional information in the 
> future.

I disagree here.  It's ctx that provides the extensible information, 
yysyntax_error_arguments is just about the token types.  That's already the 
case now for the location for instance.

I'll fork my answer for the other half of your message.

Cheers!

[Prev in Thread]

Current Thread

[Next in Thread]

Proposed interface changes for "parse.error custom", Adrian Vogelsgesang, 2020/02/24
- Re: Proposed interface changes for "parse.error custom", Akim Demaille, 2020/02/26
  - Re: Proposed interface changes for "parse.error custom", Adrian Vogelsgesang, 2020/02/26
    - Re: Proposed interface changes for "parse.error custom", Akim Demaille <=
    - Re: Proposed interface changes for "parse.error custom", Akim Demaille, 2020/02/27

Prev by Date: Re: Proposed interface changes for "parse.error custom"
Next by Date: Re: Proposed interface changes for "parse.error custom"
Previous by thread: Re: Proposed interface changes for "parse.error custom"
Next by thread: Re: Proposed interface changes for "parse.error custom"
Index(es):
- Date
- Thread