[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
branch master updated: tp/TODO: update
From: |
Patrice Dumas |
Subject: |
branch master updated: tp/TODO: update |
Date: |
Sun, 04 Feb 2024 07:37:37 -0500 |
This is an automated email from the git hooks/post-receive script.
pertusus pushed a commit to branch master
in repository texinfo.
The following commit(s) were added to refs/heads/master by this push:
new 92662fb158 tp/TODO: update
92662fb158 is described below
commit 92662fb1582b61708c12037dfff2134b31f77dee
Author: Patrice Dumas <pertusus@free.fr>
AuthorDate: Sun Feb 4 13:37:16 2024 +0100
tp/TODO: update
---
tp/TODO | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/tp/TODO b/tp/TODO
index 431eb0fb3f..6c281ee3da 100644
--- a/tp/TODO
+++ b/tp/TODO
@@ -68,17 +68,24 @@ Delayed bugs
Add XS override for Document merged_indices?
Sorting indices in C with strxfrm_l using the "en_US.utf-8" locale with
-LC_COLLATE_MASK is quite consistent with perl for number and letters, but
-leads to a different output than with Perl for non alphanumeric characters,
-which is probably somewhat incidental. There are also differences that seem to
-be related to spaces with a result that looks better in Perl. It could be the
-effect of 'variable' => 'Non-Ignorable' in Perl, as it allows to have spaces
-and punctuation marks sort before letters.
+LC_COLLATE_MASK on Debian GNU/Linux with glibc is quite consistent with perl
+for number and letters, but leads to a different output than with Perl for non
+alphanumeric characters. It is because in Perl we set 'variable' =>
'Non-Ignorable'
+to set Variable Weighting to Non-ignorable (see
+http://www.unicode.org/reports/tr10/#Variable_Weighting).
+For spaces, the output with Non-Ignorable Variable Weighting looks better for
+index sorting, as it allows to have spaces and punctuation marks sort before
+letters.
+
+In case sorting according to locale would be needed in perl it seems that
+the way to do it is to set
+use locale
+in a block and use regular sorting, it would be as if strcoll/strxfrm was used.
Transliteration/protection with iconv in C leads to a result different of Perl
for some characters. It seems that the iconv result depends on the locale, and
there are quite a bit of ? output, probably when there is no obvious
-transliteration. In those cases, the Unidecode traansliterations are not
+transliteration. In those cases, the Unidecode transliterations are not
necessarily very good, either.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- branch master updated: tp/TODO: update,
Patrice Dumas <=