[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
branch master updated: * tp/Texinfo/Convert/Unicode.pm (string_width): r
From: |
Patrice Dumas |
Subject: |
branch master updated: * tp/Texinfo/Convert/Unicode.pm (string_width): remove explicit matching of 0-width characters, they correspond to the remaining characters after matching printing characters except for Marks. |
Date: |
Fri, 08 Sep 2023 17:17:10 -0400 |
This is an automated email from the git hooks/post-receive script.
pertusus pushed a commit to branch master
in repository texinfo.
The following commit(s) were added to refs/heads/master by this push:
new bb21192b14 * tp/Texinfo/Convert/Unicode.pm (string_width): remove
explicit matching of 0-width characters, they correspond to the remaining
characters after matching printing characters except for Marks.
bb21192b14 is described below
commit bb21192b1409e54e7041fe42dcf0e2bb3745fc16
Author: Patrice Dumas <pertusus@free.fr>
AuthorDate: Fri Sep 8 23:17:02 2023 +0200
* tp/Texinfo/Convert/Unicode.pm (string_width): remove explicit
matching of 0-width characters, they correspond to the remaining
characters after matching printing characters except for Marks.
---
ChangeLog | 6 ++++++
tp/Texinfo/Convert/Unicode.pm | 18 +++---------------
2 files changed, 9 insertions(+), 15 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 50fbd7e044..7d303eeaef 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2023-09-08 Patrice Dumas <pertusus@free.fr>
+
+ * tp/Texinfo/Convert/Unicode.pm (string_width): remove explicit
+ matching of 0-width characters, they correspond to the remaining
+ characters after matching printing characters except for Marks.
+
2023-09-08 Patrice Dumas <pertusus@free.fr>
* tp/Texinfo/Convert/Unicode.pm (string_width): replace IsPrint
diff --git a/tp/Texinfo/Convert/Unicode.pm b/tp/Texinfo/Convert/Unicode.pm
index 7e20d53825..c9dc126886 100644
--- a/tp/Texinfo/Convert/Unicode.pm
+++ b/tp/Texinfo/Convert/Unicode.pm
@@ -1666,19 +1666,13 @@ sub string_width($)
# Optimise for the common case where we can just return the length
# of the string. These regexes are faster than making the substitutions
# below.
- # DefaultIgnorableCodePoint is documented in perl 5.10.1. In 2023 perl,
- # it is Default_Ignorable_Code_Point, but DefaultIgnorableCodePoint
- # seems to work too.
- #if ($string =~ /^[\p{IsPrint}]*$/
+ # IsPrint without \pM
if ($string =~ /^[\p{L}\p{N}\p{P}\p{S}\p{Zs}]*$/
- and $string !~ /[\p{InFullwidth}\pM\p{DefaultIgnorableCodePoint}]/) {
+ and $string !~ /[\p{InFullwidth}]/) {
return length($string);
}
$string =~ s/\p{InFullwidth}/\x{02}/g;
- $string =~ s/\pM/\x{00}/g;
- $string =~ s/\p{DefaultIgnorableCodePoint}/\x{00}/g;
- #$string =~ s/\p{IsPrint}/\x{01}/g;
$string =~ s/[\p{L}\p{N}\p{P}\p{S}\p{Zs}]/\x{01}/g;
$string =~ s/[^\x{01}\x{02}]/\x{00}/g;
@@ -1694,16 +1688,10 @@ sub string_width($)
foreach my $character(split '', $string) {
if ($character =~ /\p{InFullwidth}/) {
$width += 2;
- } elsif ($character =~ /[\pM\p{DefaultIgnorableCodePoint}]/) {
- # a mark set at length 0 or a Default Ignorable Code Point
- # that have no visible glyph or advance width in and of themselves
- #} elsif ($character =~ /\p{IsPrint}/) {
} elsif ($character =~ /[\p{L}\p{N}\p{P}\p{S}\p{Zs}]/) {
$width += 1;
- } elsif ($character =~ /\p{IsControl}/) {
- # Control chars may be added, for instance, as part of @image formatting
} else {
- #print STDERR "unknown char`$character'\n";
+ # zero width character: \pC (including controls), \pM, \p{Zl}, \p{Zp}
}
}
return $width;
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- branch master updated: * tp/Texinfo/Convert/Unicode.pm (string_width): remove explicit matching of 0-width characters, they correspond to the remaining characters after matching printing characters except for Marks.,
Patrice Dumas <=