bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#59275: Unexpected return value of `string-collate-lessp' on Mac


From: Maxim Nikulin
Subject: bug#59275: Unexpected return value of `string-collate-lessp' on Mac
Date: Sun, 27 Nov 2022 22:19:24 +0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2

On 27/11/2022 21:23, Eli Zaretskii wrote:
From: Maxim Nikulin Date: Sun, 27 Nov 2022 21:00:50 +0700

Concerning Org, my point is that caseless sorting should be uniform.

You need to work hard to get that.  Just using 'downcase' is not enough, and
neither is using 'string-collate-equalp'.

I do not like that in some functions `string-collate-lessp' with IGNORE-CASE argument is used while strings are passed through `downcase' in other places. When proper locales implementation is available, I believe, it is better to consistently use IGNORE-CASE. I assume that text is presented to users, not serialized to be saved or sent as data.

When `string-collate-lessp' disregards IGNORE-CASE, I consider it acceptable to use `downcase' (`upcase' may be worse since Org currently uses `downcase'). It provides reasonable balance of invested efforts and obtained result.

Does not composed/decomposed representation affect comparison result?

They are different texts, so yes, they do, and they should.
If you want to treat such strings as equivalent, you need to work even
harder, since Emacs currently doesn't have enough infrastructure to do it
right in all cases.

`("semana" "señor" ,(ucs-normalize-NFD-string "señor") "sepia")
(sort lst #'string-lessp)
=> ("semana" "señor" "sepia" "señor")
(sort lst #'string-collate-lessp)
=> ("semana" "señor" "señor" "sepia")

`string-collate-lessp' is able to handle at least some cases, it is another argument to use it.

https://stackoverflow.com/questions/319426/how-do-i-do-a-case-insensitive-string-comparison

This is about Python, no?

The value of this link is a collection of examples that are not obvious for everybody. They are applicable to behavior `string-lessp' vs. `string-collate-lessp' as well.

 From my point of view e.g. case transformation rule for Turkish I is a
minor issue

Why, Org doesn't want to support Turkish users?

From my point of view it is a minor issue in comparison to

    (string-collate-lessp "a" "B" "C" t)  ; => nil

that breaks comparison not only for accented letters.

You almost manged to convince Ihor to use `string-lessp' instead of `string-collate-lessp'. I do not think it would improve quality of support of Turkish language.

My suggestion is to fall back to `downcase' and `string-lessp' only if `string-collate-lessp' is unable to provide case insensitive comparison.

My argument against `downcase' in `string-collate-lessp' is that it may
add noticeable performance penalty.

I'd worry about correctness before performance.

`downcase' with `string-lessp' handles more cases than just `string-lessp' (leaving aside buffer-local conversion tables), so form my point of view the former is more correct. Even `downcase' with fixed "C" locale may give result more consistent with user expectations. My impression that users may be familiar with wide spread problems with sorting.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]