[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
branch master updated: * tp/Texinfo/Convert/Unicode.pm (string_width): R
From: |
Gavin D. Smith |
Subject: |
branch master updated: * tp/Texinfo/Convert/Unicode.pm (string_width): Reset count at a newline. Add a comment saying what the different character classes mean. |
Date: |
Sun, 31 Dec 2023 15:54:47 -0500 |
This is an automated email from the git hooks/post-receive script.
gavin pushed a commit to branch master
in repository texinfo.
The following commit(s) were added to refs/heads/master by this push:
new 4cbd328c8f * tp/Texinfo/Convert/Unicode.pm (string_width): Reset count
at a newline. Add a comment saying what the different character classes mean.
4cbd328c8f is described below
commit 4cbd328c8fb07f38ca52b5eaab6a76d297c1d3be
Author: Gavin Smith <gavinsmith0123@gmail.com>
AuthorDate: Sun Dec 31 20:54:39 2023 +0000
* tp/Texinfo/Convert/Unicode.pm (string_width):
Reset count at a newline. Add a comment saying what the
different character classes mean.
---
ChangeLog | 7 ++-
tp/Texinfo/Convert/Unicode.pm | 21 +++++---
.../formats_encodings/at_commands_in_refs.pl | 56 ++++++++++-----------
.../at_commands_in_refs_latin1.pl | 2 +-
.../res_info/at_commands_in_refs_latin1.info | Bin 8004 -> 7999 bytes
.../formats_encodings/at_commands_in_refs_utf8.pl | 2 +-
.../res_info/at_commands_in_refs_utf8.info | Bin 8401 -> 8396 bytes
.../unclosed_verb_on_section_line.pl | 2 +-
8 files changed, 50 insertions(+), 40 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 345f8b2513..857e059e18 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2023-12-31 Gavin Smith <gavinsmith0123@gmail.com>
+
+ * tp/Texinfo/Convert/Unicode.pm (string_width):
+ Reset count at a newline. Add a comment saying what the
+ different character classes mean.
+
2023-12-31 Gavin Smith <gavinsmith0123@gmail.com>
* tp/Texinfo/Translations.pm (gdt_string_columns): Adjust to
@@ -18,7 +24,6 @@
(convert_xtable_command), commands_internal_conversion_table):
implement convert_multitable_command and convert_xtable_command.
-
2023-12-31 Patrice Dumas <pertusus@free.fr>
* tp/Texinfo/XS/convert/convert_html.c (convert_enumerate_command)
diff --git a/tp/Texinfo/Convert/Unicode.pm b/tp/Texinfo/Convert/Unicode.pm
index 4ccf70440b..bc977e61b1 100644
--- a/tp/Texinfo/Convert/Unicode.pm
+++ b/tp/Texinfo/Convert/Unicode.pm
@@ -1692,20 +1692,23 @@ sub string_width($)
# Optimise for the common case where we can just return the length
# of the string. These regexes are faster than making the substitutions
# below.
- # IsPrint without \pM
+ # IsPrint without \p{Mark}. Matches classes Letter, Number, Punct, Symbol,
+ # and Space_Separator.
if ($string =~ /^[\p{L}\p{N}\p{P}\p{S}\p{Zs}]*$/
and $string !~ /[\p{InFullwidth}]/) {
return length($string);
}
- $string =~ s/\p{InFullwidth}/\x{02}/g;
- $string =~ s/[\p{L}\p{N}\p{P}\p{S}\p{Zs}]/\x{01}/g;
- $string =~ s/[^\x{01}\x{02}]/\x{00}/g;
+ if ($string !~ /\n/) {
+ $string =~ s/\p{InFullwidth}/\x{02}/g;
+ $string =~ s/[\p{L}\p{N}\p{P}\p{S}\p{Zs}]/\x{01}/g;
+ $string =~ s/[^\x{01}\x{02}]/\x{00}/g;
- # This sums up the byte values of the bytes in $string, which now are
- # all either 0, 1 or 2. This is faster. The original, more readable
- # version is below.
- return unpack("U0%32A*", $string);
+ # This sums up the byte values of the bytes in $string, which now are
+ # all either 0, 1 or 2. This is faster. The original, more readable
+ # version is below.
+ return unpack("U0%32A*", $string);
+ }
if (! defined($string)) {
cluck();
@@ -1716,6 +1719,8 @@ sub string_width($)
$width += 2;
} elsif ($character =~ /[\p{L}\p{N}\p{P}\p{S}\p{Zs}]/) {
$width += 1;
+ } elsif ($character eq "\n") {
+ $width = 0;
} else {
# zero width character: \pC (including controls), \pM, \p{Zl}, \p{Zp}
}
diff --git a/tp/t/results/formats_encodings/at_commands_in_refs.pl
b/tp/t/results/formats_encodings/at_commands_in_refs.pl
index 64ac2c45e7..c41eaef674 100644
--- a/tp/t/results/formats_encodings/at_commands_in_refs.pl
+++ b/tp/t/results/formats_encodings/at_commands_in_refs.pl
@@ -13821,7 +13821,7 @@ $result_texts{'at_commands_in_refs'} = 'Top
2 !
. . ? @
-*****************
+*********
3 @ { } \\ #
***********
@@ -15603,7 +15603,7 @@ $result_converted{'plaintext'}->{'at_commands_in_refs'}
= 'Top
2 !
. . ? @
-**************
+*********
3 @ { } \\ #
***********
@@ -16760,7 +16760,7 @@ File: , Node: ! . . ? @, Next: @ { } \\ #, Prev:
{ }, Up: Top
2 !
. . ? @
-**************
+*********
File: , Node: @ { } \\ #, Next: LaTeX TeX • , © ... ..., Prev: ! . .
? @, Up: Top
@@ -16974,31 +16974,31 @@ Tag Table:
Node: Top27
Node: { }783
Node: ! . . ? @862
-Node: @ { } \\ #966
-Node: LaTeX TeX • , © ... ...1085
-Node: ≡ error→ € ¡ ↦ −1235
-Node: ≥ ≤ →1367
-Node: ª º ⋆ £ ⊣ ¿ ®1465
-Node: ⇒ ° a b a sunny day å1584
-Node: Å æ œ Æ Œ ø Ø ß ł Ł Ð ð Þ þ1741
-Node: ä ẽ î â à é ç ē e̊ e̋ ę1920
-Node: ė ĕ e̲ ẹ ě ȷ e͡e2086
-Node: ı Ḕ Ḉ2216
-Node: “ ” ‘ ’ „ ‚2314
-Node: « » « » ‹ ›2419
-Node: `` \'\' --- -- ` \'2535
-Node: AAA (fff) AAA BBB2659
-Node: CCC (rrr) CCC DDD2799
-Node: the someone <someone@somewher> <no_explain@there>2972
-Node: [f--ile1] [image src="f--ile.png" alt="alt" text="Image
description\\"\\"\\\\."