bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 0/8] Revamp the handling token string aliases in error messages


From: Akim Demaille
Subject: [PATCH 0/8] Revamp the handling token string aliases in error messages
Date: Sat, 29 Dec 2018 17:30:19 +0100

Hi all,

This series of patches addresses two related shortcomings: currently
we destroy non-ASCII token strings (which ruins Hans' use of
mathematical symbols for instance), and we don't provide a means to
translate the token names in error messages.

See https://lists.gnu.org/archive/html/bison-patches/2018-11/msg00030.html.

Paul, I have completely removed your work that quoted the token names
in tname.  In retrospect, I don't think we should have done that.
This way it becomes straightforward to translate these strings, as
shown by the "translate bison's own tokens" change.  Bison's grammar
becomes:

    %token
      GRAM_EOF 0   _("end of file")
      STRING       _("string")
      TSTRING      _("translatable string")

and then I have:

    $ cat /tmp/wrong.y
    %token 12
    %%
    exp:
    
    $ LC_ALL=C ./_build/8d/tests/bison /tmp/wrong.y
    /tmp/wrong.y:1.8-9: error: syntax error, unexpected integer literal, 
expecting character literal or identifier or <tag>
     %token 12
            ^^
    
    $ ./_build/8d/tests/bison /tmp/wrong.y
    /tmp/wrong.y:1.8-9: erreur: erreur de syntaxe, littéral entier inattendu, 
attendait caractère littéral ou identifiant ou <tag>
     %token 12
            ^^

What I did also changes the signature of yytnamerr, which you made
overridable by the user using an ifndef yytnamerr.  This customization
point, yytnamerr, was not documented.  I think that if we simply
change the name, things will continue to compile, but some error
messages, if people tweaked yytnamerr, will change maybe unexpectedly.

I also removed the support for trigraphs.  Again, I would claim that
that's the user's problem.

However, I do have broken a documented contract: the documentation
clearly specifies how tokens are stored in yytname:

 -- Directive: %token-table
     Generate an array of token names in the parser implementation file.
     The name of the array is ‘yytname’; ‘yytname[I]’ is the name of the
     token whose internal Bison token code number is I.  The first three
     elements of ‘yytname’ correspond to the predefined tokens ‘"$end"’,
     ‘"error"’, and ‘"$undefined"’; after these come the symbols defined
     in the grammar file.

     The name in the table includes all the characters needed to
     represent the token in Bison.  For single-character literals and
     literal strings, this includes the surrounding quoting characters
     and any escape sequences.  For example, the Bison single-character
     literal ‘'+'’ corresponds to a three-character name, represented in
     C as ‘"'+'"’; and the Bison two-character literal string ‘"\\/"’
     corresponds to a five-character name, represented in C as
     ‘"\"\\\\/\""’.

I don't understand well what people can do from this table.  In
particular, it is not easily helpful to directly generate scanner
rules, since the connection with the external token number (the one
returned by yylex) is not trivial and is not documented.

Rici, you might have some relevant input on this issue.  If that's
really a problem, we can generate two tables: one for backward
compatibility (deprecated?), and the new one for error messages.


This series of patch is a starting point to discuss alternatives.
Nothing is cast in stone here.

I would really like to address this in 3.3, which I expect to release
within a couple of months at most.  This feature was the last one
expected in 3.3.


This is currently on gnu.org in the token-i18n branch, and available
in these tarballs.

https://www.lrde.epita.fr/~akim/private/bison/bison-3.2.1.153-3e2f3.tar.gz
https://www.lrde.epita.fr/~akim/private/bison/bison-3.2.1.153-3e2f3.tar.xz

Cheers!


Akim Demaille (8):
  yacc.c: avoid negated if
  parsers: revamp the interface of yytnamerr
  tests: no longer play with trigraphs
  parsers: don't double escape tnames
  parsers: support translatable token aliases
  tests: check that internationalization of token works
  translate bison's own tokens
  regen

 data/skeletons/glr.c      |   90 +--
 data/skeletons/lalr1.cc   |   56 +-
 data/skeletons/lalr1.d    |   38 +-
 data/skeletons/lalr1.java |   41 +-
 data/skeletons/yacc.c     |   75 +-
 src/output.c              |   33 +-
 src/parse-gram.c          | 1358 +++++++++++++++++++------------------
 src/parse-gram.h          |  141 ++--
 src/parse-gram.y          |   96 +--
 src/scan-gram.l           |   25 +-
 src/symtab.c              |    3 +-
 src/symtab.h              |    7 +-
 tests/calc.at             |   21 +-
 tests/input.at            |   10 +-
 tests/javapush.at         |   64 +-
 tests/local.at            |    5 +-
 tests/regression.at       |   38 +-
 17 files changed, 1019 insertions(+), 1082 deletions(-)

-- 
2.20.0



reply via email to

[Prev in Thread] Current Thread [Next in Thread]