coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to sort unicode properly?


From: Eric Blake
Subject: Re: How to sort unicode properly?
Date: Wed, 25 Sep 2019 10:27:58 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.0

On 9/25/19 10:20 AM, Peng Yu wrote:
Hi,

It seems that "café" should be sorted before "caff" in Unicode.

https://github.com/jtauber/pyuca

But `sort` does not do so.

$ printf '%s\n' cafe caff café | LC_ALL=UTF8  sort
cafe
caff
café
$ printf '%s\n' cafe caff café | LC_ALL=en_US.UTF-8  sort
cafe
caff
café

How to make `sort` sort according to Unicode order? Thanks.

You'll have to write a locale definition where strcoll() sorts in the order you want. Coreutils sort is calling strcoll(), and if it doesn't sort the way you think it should, the bug is in your locale and not in coreutils. You'll want to report this issue to whoever provided your en_US.UTF-8 locale (perhaps glibc?)

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]