texinfo-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

branch master updated: * tp/Texinfo/Convert/Unicode.pm (string_width): r


From: Patrice Dumas
Subject: branch master updated: * tp/Texinfo/Convert/Unicode.pm (string_width): remove explicit matching of 0-width characters, they correspond to the remaining characters after matching printing characters except for Marks.
Date: Fri, 08 Sep 2023 17:17:10 -0400

This is an automated email from the git hooks/post-receive script.

pertusus pushed a commit to branch master
in repository texinfo.

The following commit(s) were added to refs/heads/master by this push:
     new bb21192b14 * tp/Texinfo/Convert/Unicode.pm (string_width): remove 
explicit matching of 0-width characters, they correspond to the remaining 
characters after matching printing characters except for Marks.
bb21192b14 is described below

commit bb21192b1409e54e7041fe42dcf0e2bb3745fc16
Author: Patrice Dumas <pertusus@free.fr>
AuthorDate: Fri Sep 8 23:17:02 2023 +0200

    * tp/Texinfo/Convert/Unicode.pm (string_width): remove explicit
    matching of 0-width characters, they correspond to the remaining
    characters after matching printing characters except for Marks.
---
 ChangeLog                     |  6 ++++++
 tp/Texinfo/Convert/Unicode.pm | 18 +++---------------
 2 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 50fbd7e044..7d303eeaef 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2023-09-08  Patrice Dumas <pertusus@free.fr>
+
+       * tp/Texinfo/Convert/Unicode.pm (string_width): remove explicit
+       matching of 0-width characters, they correspond to the remaining
+       characters after matching printing characters except for Marks.
+
 2023-09-08  Patrice Dumas <pertusus@free.fr>
 
        * tp/Texinfo/Convert/Unicode.pm (string_width): replace IsPrint
diff --git a/tp/Texinfo/Convert/Unicode.pm b/tp/Texinfo/Convert/Unicode.pm
index 7e20d53825..c9dc126886 100644
--- a/tp/Texinfo/Convert/Unicode.pm
+++ b/tp/Texinfo/Convert/Unicode.pm
@@ -1666,19 +1666,13 @@ sub string_width($)
   # Optimise for the common case where we can just return the length
   # of the string.  These regexes are faster than making the substitutions
   # below.
-  # DefaultIgnorableCodePoint is documented in perl 5.10.1. In 2023 perl,
-  # it is Default_Ignorable_Code_Point, but DefaultIgnorableCodePoint
-  # seems to work too.
-  #if ($string =~ /^[\p{IsPrint}]*$/
+  # IsPrint without \pM
   if ($string =~ /^[\p{L}\p{N}\p{P}\p{S}\p{Zs}]*$/
-      and $string !~ /[\p{InFullwidth}\pM\p{DefaultIgnorableCodePoint}]/) {
+      and $string !~ /[\p{InFullwidth}]/) {
     return length($string);
   }
 
   $string =~ s/\p{InFullwidth}/\x{02}/g;
-  $string =~ s/\pM/\x{00}/g;
-  $string =~ s/\p{DefaultIgnorableCodePoint}/\x{00}/g;
-  #$string =~ s/\p{IsPrint}/\x{01}/g;
   $string =~ s/[\p{L}\p{N}\p{P}\p{S}\p{Zs}]/\x{01}/g;
   $string =~ s/[^\x{01}\x{02}]/\x{00}/g;
 
@@ -1694,16 +1688,10 @@ sub string_width($)
   foreach my $character(split '', $string) {
     if ($character =~ /\p{InFullwidth}/) {
       $width += 2;
-    } elsif ($character =~ /[\pM\p{DefaultIgnorableCodePoint}]/) {
-      # a mark set at length 0 or a Default Ignorable Code Point
-      # that have no visible glyph or advance width in and of themselves
-    #} elsif ($character =~ /\p{IsPrint}/) {
     } elsif ($character =~ /[\p{L}\p{N}\p{P}\p{S}\p{Zs}]/) {
       $width += 1;
-    } elsif ($character =~ /\p{IsControl}/) {
-      # Control chars may be added, for instance, as part of @image formatting
     } else {
-      #print STDERR "unknown char`$character'\n";
+      # zero width character: \pC (including controls), \pM, \p{Zl}, \p{Zp}
     }
   }
   return $width;



reply via email to

[Prev in Thread] Current Thread [Next in Thread]