coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to sort unicode properly?


From: Peng Yu
Subject: Re: How to sort unicode properly?
Date: Wed, 25 Sep 2019 14:46:57 -0500

If python can have pyuca that works across platform, why such thing can not
have at C level?

On Wed, Sep 25, 2019 at 12:24 PM Eric Blake <address@hidden> wrote:

> On 9/25/19 10:56 AM, Peng Yu wrote:
> > I want to make my `sort` to be machine-independent and always use the
> > correct Unicode sort order. Is there a way to do so?
>
> Those two goals are somewhat at odds.  The only truly portable
> machine-independent sorting is the one guaranteed by POSIX when you use
> LC_ALL=C (fun fact: even on an EBCDIC machine, that is required by POSIX
> to collate in ASCII order, rather than native byte order).  The moment
> you use any other locale, then you not only left to the mercies of
> whoever wrote that locale, but also stuck with the fact that there is no
> portable way to transfer locale definitions from one vendor's libc to
> another.
>
> >
> > I don't know how to check where en_US.UTF-8 comes from. Do you know
> > how to check it? (I use Mac OS X.)
>
> All other locales are somewhat vendor-dependent; as you've discovered,
> your vendor (Apple) has a rather gaping hole in their locale support.
> But because Apple is a closed-source shop, it will have to be Apple that
> fixes their bug, unless you want to take on the gargantuan task of
> writing a gnulib module that provides locale tables to mirror glibc for
> use on non-glibc machines.
>
> Note that glibc doesn't have that problem, at least on my system:
>
> $ cat /etc/fedora-release
> Fedora release 30 (Thirty)
> $ rpm -q glibc
> glibc-2.29-22.fc30.x86_64
> $ printf '%s\n' cafe caff café | LC_ALL=en_US.UTF-8  sort --debug
> sort: text ordering performed using ‘en_US.UTF-8’ sorting rules
> cafe
> ____
> café
> ____
> caff
> ____
>
> So one option you could pursue is switching to an operating system that
> does not curtail your freedoms.
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
>
-- 
Regards,
Peng


reply via email to

[Prev in Thread] Current Thread [Next in Thread]