[PATCH 4/4] diagnostics: fix the handling of multibyte characters

bison-patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 4/4] diagnostics: fix the handling of multibyte characters

From:	Akim Demaille
Subject:	[PATCH 4/4] diagnostics: fix the handling of multibyte characters
Date:	Sun, 21 Apr 2019 09:41:49 +0200

This is a pity: efforts were invested in computing correctly the
number of screen columns consumed by multibyte characters, but the
routines that do that were fed by single-byte inputs...

As a consequence Bison never displayed correctly locations when there
are multibyte characters.

* src/scan-gram.l (ucp): New.
Use it instead of . in the catch-all clause.
* tests/diagnostics.at (Tabulations): Enhance into...
(Tabulations and multibyte characters): this.
---
 src/scan-gram.l      | 18 +++++++++++++++---
 tests/diagnostics.at | 33 +++++++++++++++++++++++++++------
 2 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/src/scan-gram.l b/src/scan-gram.l
index c69b1b5d..c8a596cd 100644
--- a/src/scan-gram.l
+++ b/src/scan-gram.l
@@ -135,11 +135,13 @@ static void unexpected_newline (boundary, char const *);
 %x SC_BRACKETED_ID SC_RETURN_BRACKETED_ID
 
 letter    [.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_]
-notletter [^.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_]{-}[%\{]
 id        {letter}({letter}|[-0-9])*
 int       [0-9]+
 xint      0[xX][0-9abcdefABCDEF]+
 
+ /* UTF-8 Encoded Unicode Code Point */
+ucp       
[\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF]([\x80-\xBF]{2})|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x\90-\xBF]([\x80-\xBF]{2})|[\xF1-\xF3]([\x80-\xBF]{3})|\xF4[\x80-\x8F]([\x80-\xBF]{2})
+
 /* Zero or more instances of backslash-newline.  Following GCC, allow
    white space between the backslash and the newline.  */
 splice   (\\[ \f\t\v]*\n)*
@@ -790,8 +792,18 @@ eqopt    ({sp}=)?
   | By default, grow the string obstack with the input.  |
   `-----------------------------------------------------*/
 
-<SC_COMMENT,SC_LINE_COMMENT,SC_BRACED_CODE,SC_PREDICATE,SC_PROLOGUE,SC_EPILOGUE,SC_STRING,SC_CHARACTER,SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>.
 |
-  
<SC_COMMENT,SC_LINE_COMMENT,SC_BRACED_CODE,SC_PREDICATE,SC_PROLOGUE,SC_EPILOGUE>\n
    STRING_GROW;
+  /* Accept multibyte characters in one block instead of byte after
+     byte, so that add_column_width and mbsnwidth can compute correct
+     screen width.
+
+     Add a fallthrough "|." so that non UTF-8 input is still accepted
+     and does not jam the scanner.  */
+
+
+<SC_COMMENT,SC_LINE_COMMENT,SC_BRACED_CODE,SC_PREDICATE,SC_PROLOGUE,SC_EPILOGUE,SC_STRING,SC_CHARACTER,SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>
+{
+  {ucp}|.   STRING_GROW;
+}
 
 %%
 
diff --git a/tests/diagnostics.at b/tests/diagnostics.at
index 606b0373..e28eecf8 100644
--- a/tests/diagnostics.at
+++ b/tests/diagnostics.at
@@ -106,18 +106,24 @@ input.y:17.2: <warning>warning:</warning> empty rule 
without %empty [<warning>-W
 ]])
 
 
-## ------------- ##
-## Tabulations.  ##
-## ------------- ##
+## -------------------------------------- ##
+## Tabulations and multibyte characters.  ##
+## -------------------------------------- ##
 
-# Make sure we treat tabulations as eight spaces.
+# Make sure we treat tabulations as eight spaces, and that multibyte
+# characters have correct width.
 
-AT_TEST([[Tabulations]],
+AT_TEST([[Tabulations and multibyte characters]],
 [[%%
-exp: a b c
+exp: a b c d e f g h
 a: {           }
 b: {            }
 c: {------------}
+d: {éééééééééééé}
+e: {∇⃗×𝐸⃗ = -∂𝐵⃗/∂t}
+f: {   42      }
+g: {   "฿¥$€₦" }
+h: {   🐃       }
 ]],
 [[input.y:11.4-17: <warning>warning:</warning> empty rule without %empty 
[<warning>-Wempty-rule</warning>]
  a: <warning>{         }</warning>
@@ -128,6 +134,21 @@ input.y:12.4-17: <warning>warning:</warning> empty rule 
without %empty [<warning
 input.y:13.4-17: <warning>warning:</warning> empty rule without %empty 
[<warning>-Wempty-rule</warning>]
  c: <warning>{------------}</warning>
     <warning>^~~~~~~~~~~~~~</warning>
+input.y:14.4-29: <warning>warning:</warning> empty rule without %empty 
[<warning>-Wempty-rule</warning>]
+ d: <warning>{éééééééééééé}</warning>
+    <warning>^~~~~~~~~~~~~~~~~~~~~~~~~~</warning>
+input.y:15.4-39: <warning>warning:</warning> empty rule without %empty 
[<warning>-Wempty-rule</warning>]
+ e: <warning>{∇⃗×𝐸⃗ = -∂𝐵⃗/∂t}</warning>
+    <warning>^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~</warning>
+input.y:16.4-17: <warning>warning:</warning> empty rule without %empty 
[<warning>-Wempty-rule</warning>]
+ f: <warning>{ 42      }</warning>
+    <warning>^~~~~~~~~~~~~~</warning>
+input.y:17.4-25: <warning>warning:</warning> empty rule without %empty 
[<warning>-Wempty-rule</warning>]
+ g: <warning>{ "฿¥$€₦" }</warning>
+    <warning>^~~~~~~~~~~~~~~~~~~~~~</warning>
+input.y:18.4-17: <warning>warning:</warning> empty rule without %empty 
[<warning>-Wempty-rule</warning>]
+ h: <warning>{ 🐃       }</warning>
+    <warning>^~~~~~~~~~~~~~</warning>
 ]])
 
 
-- 
2.21.0

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH 0/4] Fix caret errors, Akim Demaille, 2019/04/21
- [PATCH 3/4] diagnostics: check the handling of tabulations, Akim Demaille, 2019/04/21
- [PATCH 2/4] diagnostics: fix styling issues, Akim Demaille, 2019/04/21
- [PATCH 4/4] diagnostics: fix the handling of multibyte characters, Akim Demaille <=
  - Re: [PATCH 4/4] diagnostics: fix the handling of multibyte characters, Akim Demaille, 2019/04/22
- [PATCH 1/4] diagnostics: check the styling, Akim Demaille, 2019/04/21
- Re: [PATCH 0/4] Fix caret errors, Paul Eggert, 2019/04/21
  - Re: [PATCH 0/4] Fix caret errors, Akim Demaille, 2019/04/21
    - Re: [PATCH 0/4] Fix caret errors, Akim Demaille, 2019/04/22
    - diagnostics: document the change of format (was: [PATCH 0/4] Fix caret errors), Akim Demaille, 2019/04/22
    - Re: diagnostics: document the change of format (was: [PATCH 0/4] Fix caret errors), Akim Demaille, 2019/04/23
- Re: [PATCH 0/4] Fix caret errors, Hans Åberg, 2019/04/21
  - Re: [PATCH 0/4] Fix caret errors, Akim Demaille, 2019/04/22
    - Re: [PATCH 0/4] Fix caret errors, Hans Åberg, 2019/04/22

Prev by Date: [PATCH 2/4] diagnostics: fix styling issues
Next by Date: [PATCH 1/4] diagnostics: check the styling
Previous by thread: [PATCH 2/4] diagnostics: fix styling issues
Next by thread: Re: [PATCH 4/4] diagnostics: fix the handling of multibyte characters
Index(es):
- Date
- Thread