bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#59275: Unexpected return value of `string-collate-lessp' on Mac


From: Eli Zaretskii
Subject: bug#59275: Unexpected return value of `string-collate-lessp' on Mac
Date: Wed, 16 Nov 2022 15:00:06 +0200

> From: Ihor Radchenko <yantar92@posteo.net>
> Cc: 59275@debbugs.gnu.org
> Date: Wed, 16 Nov 2022 01:34:09 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> >> > string-collate-lessp is inherently platform- (and locale-) dependent.
> >> > Don't use it if you want consistent results across platforms and
> >> > locales.
> >> 
> >> Is there a better alternative?
> >
> > Alternative to do what job?
> 
> Reliable sorting.
> In particular, I am looking for a better PREDICATE argument for
> `sort-subr' for case-sensitive and case-insensitive sorting of strings.

In the strict order of Unicode codepoints?  Use compare-strings.

> >> Also, do I miss something, or is this pitfall not documented in the
> >> docstring of `string-collate-lessp'?
> >
> > It isn't? then what is this about:
> >
> >   This function obeys the conventions for collation order in your
> >   locale settings.  For example, punctuation and whitespace characters
> >   might be considered less significant for sorting:
> >
> >   (sort '("11" "12" "1 1" "1 2" "1.1" "1.2") 'string-collate-lessp)
> >     => ("11" "1 1" "1.1" "12" "1 2" "1.2")
> >   [...]
> >   To emulate Unicode-compliant collation on MS-Windows systems,
> >   bind ‘w32-collate-ignore-punctuation’ to a non-nil value, since
> >   the codeset part of the locale cannot be "UTF-8" on MS-Windows.
> 
> The above sounds like we just need to worry about some edge cases where
> different approaches may exist to sorting. Like with punctuation,
> numbers, and spaces.
> 
> Having
> 
>   (string-collate-lessp "a" "B" "C" t)  ; => nil
> 
> is totally unexpected because case-insensitive "a"<"B"<"C" sounds like
> the only reasonable outcome.

It is hard to guess what will be unexpected for people.  When the doc
string was written, the example used there was deemed to be the most
striking surprise from using locale-dependent collation, so it was
what we used.

> I'd like the warning to be even more prominent.

You want to make it explicit that for systems where we use
string-lessp the IGNORE-CASE argument is ignored?  Or do you want some
other change?

Anyway, feel free to suggest some text to that effect.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]