branch master updated: tp/TODO: update

texinfo-commits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

branch master updated: tp/TODO: update

From:	Patrice Dumas
Subject:	branch master updated: tp/TODO: update
Date:	Sun, 04 Feb 2024 07:37:37 -0500

This is an automated email from the git hooks/post-receive script.

pertusus pushed a commit to branch master
in repository texinfo.

The following commit(s) were added to refs/heads/master by this push:
     new 92662fb158 tp/TODO: update
92662fb158 is described below

commit 92662fb1582b61708c12037dfff2134b31f77dee
Author: Patrice Dumas <pertusus@free.fr>
AuthorDate: Sun Feb 4 13:37:16 2024 +0100

    tp/TODO: update
---
 tp/TODO | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/tp/TODO b/tp/TODO
index 431eb0fb3f..6c281ee3da 100644
--- a/tp/TODO
+++ b/tp/TODO
@@ -68,17 +68,24 @@ Delayed bugs
 Add XS override for Document merged_indices?
 
 Sorting indices in C with strxfrm_l using the "en_US.utf-8" locale with
-LC_COLLATE_MASK is quite consistent with perl for number and letters, but
-leads to a different output than with Perl for non alphanumeric characters,
-which is probably somewhat incidental.  There are also differences that seem to
-be related to spaces with a result that looks better in Perl.  It could be the
-effect of 'variable' => 'Non-Ignorable' in Perl, as it allows to have spaces
-and punctuation marks sort before letters.
+LC_COLLATE_MASK on Debian GNU/Linux with glibc is quite consistent with perl
+for number and letters, but leads to a different output than with Perl for non
+alphanumeric characters.  It is because in Perl we set 'variable' => 
'Non-Ignorable'
+to set Variable Weighting to Non-ignorable (see
+http://www.unicode.org/reports/tr10/#Variable_Weighting).
+For spaces, the output with Non-Ignorable Variable Weighting looks better for
+index sorting, as it allows to have spaces and punctuation marks sort before
+letters.
+
+In case sorting according to locale would be needed in perl it seems that
+the way to do it is to set
+use locale
+in a block and use regular sorting, it would be as if strcoll/strxfrm was used.
 
 Transliteration/protection with iconv in C leads to a result different of Perl
 for some characters.  It seems that the iconv result depends on the locale, and
 there are quite a bit of ? output, probably when there is no obvious
-transliteration.  In those cases, the Unidecode traansliterations are not
+transliteration.  In those cases, the Unidecode transliterations are not
 necessarily very good, either.

[Prev in Thread]

Current Thread

[Next in Thread]

branch master updated: tp/TODO: update, Patrice Dumas <=

Prev by Date: branch master updated: Make indices sorting independent of the output encodings
Next by Date: branch master updated: Update po files
Previous by thread: branch master updated: Make indices sorting independent of the output encodings
Next by thread: branch master updated: Update po files
Index(es):
- Date
- Thread