coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to sort unicode properly?


From: Peng Yu
Subject: Re: How to sort unicode properly?
Date: Wed, 25 Sep 2019 10:56:29 -0500

I want to make my `sort` to be machine-independent and always use the
correct Unicode sort order. Is there a way to do so?

I don't know how to check where en_US.UTF-8 comes from. Do you know
how to check it? (I use Mac OS X.)

On 9/25/19, Eric Blake <address@hidden> wrote:
> On 9/25/19 10:20 AM, Peng Yu wrote:
>> Hi,
>>
>> It seems that "café" should be sorted before "caff" in Unicode.
>>
>> https://github.com/jtauber/pyuca
>>
>> But `sort` does not do so.
>>
>> $ printf '%s\n' cafe caff café | LC_ALL=UTF8  sort
>> cafe
>> caff
>> café
>> $ printf '%s\n' cafe caff café | LC_ALL=en_US.UTF-8  sort
>> cafe
>> caff
>> café
>>
>> How to make `sort` sort according to Unicode order? Thanks.
>
> You'll have to write a locale definition where strcoll() sorts in the
> order you want.  Coreutils sort is calling strcoll(), and if it doesn't
> sort the way you think it should, the bug is in your locale and not in
> coreutils.  You'll want to report this issue to whoever provided your
> en_US.UTF-8 locale (perhaps glibc?)
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
>


-- 
Regards,
Peng



reply via email to

[Prev in Thread] Current Thread [Next in Thread]