bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: index sorting in texi2any in C issue with spaces


From: Patrice Dumas
Subject: Re: index sorting in texi2any in C issue with spaces
Date: Sun, 4 Feb 2024 12:17:16 +0100

On Thu, Feb 01, 2024 at 10:16:07PM +0000, Gavin Smith wrote:
> An alternative is not to have such a variable but just to have an option
> to collate according to the user's locale.  Then the user would run e.g.
> "LC_COLLATE=ll_LL.UTF-8 texi2any ..." to use collation from the ll_LL.UTF-8
> locale.  They would have to have the locale installed that was appropriate
> for whichever manual they were processing (assuming the "variable weighting"
> option is appropriate.)

I do not like that possibility, I think that we should avoid using the user
locales when it comes to document output in general.  If we use the user
locale I think that it should be by using strxfrm in C and "use locale" in
Perl, not by checking a specific LC_COLLATE value in the environment.

Here is my updated thinking on the possibilities

1) lexicographic sorting on unicode strings (corresponds to
                                 USE_UNICODE_COLLATION=0 currently)
2) unicode default sorting obtained by Unicode::Collate in Perl and
   strxfrm_l in C with "en_US.utf-8", the current default ("en_US.utf-8"
   could be different on different platforms, a list instead of only one
   possibility if "en_US.utf-8" is not always available...)
3) sorting based on @documentlanguage using, in perl
   Unicode::Collate::Locale with locale @documentlanguage and in C
   strxfrm_l with "@documentlanguage.utf-8" (at least on GNU/Linux,
   the locale name setup for strxfrm_l could be different on other platforms).
4) sorting based on a customization variable, such as COLLATION_LANGUAGE.
   it would be the same as the previous one, with @documentlanguage
   replaced by COLLATION_LANGUAGE.
5) sorting based on the user locale, using strxfrm in C and
   "use locale" and regular sorting on unicode (internal perl encoded) strings
   in Perl.

1) and 2) are already implemented and currently customized with
USE_UNICODE_COLLATION.  I do not think that we need 5), but we could
implement it if users ask for it.  We do not need to implement the other
options right away, but we may want to think about the way to select
those options such as not to change the customization options when they
are implemented.  I think that the options are

* use only one variable with a textual value, for example with, for 1-5
  above
  USE_COLLATION=basic/default/documentlanguage/custom/locale
* use different variables as switches between the different options, for
  instance USE_UNICODE_COLLATION to switch to 1), and more or less
  one variable for each of the other possibilities.

I personally would favour using only one customization variable, but I
will implement whatever is preferred.

-- 
Pat



reply via email to

[Prev in Thread] Current Thread [Next in Thread]