bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#59275: Unexpected return value of `string-collate-lessp' on Mac


From: Eli Zaretskii
Subject: bug#59275: Unexpected return value of `string-collate-lessp' on Mac
Date: Sun, 27 Nov 2022 17:42:40 +0200

> From: Maxim Nikulin <m.a.nikulin@gmail.com>
> Date: Sun, 27 Nov 2022 22:19:24 +0700
> Cc: Ihor Radchenko <yantar92@posteo.net>, 59275@debbugs.gnu.org
> 
> I do not like that in some functions `string-collate-lessp' with 
> IGNORE-CASE argument is used while strings are passed through `downcase' 
> in other places. When proper locales implementation is available, I 
> believe, it is better to consistently use IGNORE-CASE.

I already explained up-thread why we ignore IGNORE-CASE when collation order
is not known.  I stand by that reasoning.  I believe your opinion is based
on considering only simple locales, and on the a-priori knowledge what is
the locale's collation to begin with, something that Emacs cannot know in
that case.

> When `string-collate-lessp' disregards IGNORE-CASE, I consider it 
> acceptable to use `downcase' (`upcase' may be worse since Org currently 
> uses `downcase'). It provides reasonable balance of invested efforts and 
> obtained result.

We disagree, sorry.

> `("semana" "señor" ,(ucs-normalize-NFD-string "señor") "sepia")
> (sort lst #'string-lessp)
> => ("semana" "señor" "sepia" "señor")
> (sort lst #'string-collate-lessp)
> => ("semana" "señor" "señor" "sepia")
> 
> `string-collate-lessp' is able to handle at least some cases

On what OS and with which libc?

And I don't think this is evidence of collation knowing about equivalent
sequences.  It is most probable the side effect of collation ignoring
Latin accents altogether.

> >> https://stackoverflow.com/questions/319426/how-do-i-do-a-case-insensitive-string-comparison
> > 
> > This is about Python, no?
> 
> The value of this link is a collection of examples that are not obvious 
> for everybody. They are applicable to behavior `string-lessp' vs. 
> `string-collate-lessp' as well.

Which parts are applicable, in your opinion, and in what way?

> >>  From my point of view e.g. case transformation rule for Turkish I is a
> >> minor issue
> > 
> > Why, Org doesn't want to support Turkish users?
> 
>  From my point of view it is a minor issue in comparison to
> 
>      (string-collate-lessp "a" "B" "C" t)  ; => nil
> 
> that breaks comparison not only for accented letters.

Org is free to make such misguided decisions, but Emacs won't.  We cannot
decide that some locale is "minor" and others are "major".  My suggestion is
to look for a solution that works in any locale.

> You almost manged to convince Ihor to use `string-lessp' instead of 
> `string-collate-lessp'. I do not think it would improve quality of 
> support of Turkish language.

I didn't try to convince Ihor of anything, just point out the pitfalls of
using locale-specific collation order in portable programs.  I said back
then that I don't know enough to evaluate your decisions.  Once you
understand the subtle issues with these APIs, it is your call to decide how
to solve your particular problems.

> My suggestion is to fall back to `downcase' and `string-lessp' only if 
> `string-collate-lessp' is unable to provide case insensitive comparison.

You can do that in Org if that's the decision of the Org developers.  Emacs
cannot do that automatically for the reasons I explained up-thread.

> >> My argument against `downcase' in `string-collate-lessp' is that it may
> >> add noticeable performance penalty.
> > 
> > I'd worry about correctness before performance.
> 
> `downcase' with `string-lessp' handles more cases than just 
> `string-lessp' (leaving aside buffer-local conversion tables), so form 
> my point of view the former is more correct.

I'm quite sure this is only true for the cases that you considered, not in
general.

> Even `downcase' with fixed "C" locale may give result more consistent with
> user expectations.

How does it help on systems where locale-specific collation is not
accessible to Emacs?

> My impression that users may be familiar with wide spread problems with
> sorting.

Not IME.  But that's a separate issue, and I don't pretend to know Org users
better than you do, so I will defer to you on this one.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]