coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sort --stable (-s) doesn't appear to work on my system


From: Eric Blake
Subject: Re: sort --stable (-s) doesn't appear to work on my system
Date: Tue, 8 Dec 2015 15:25:39 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 12/08/2015 02:26 PM, Terry Farrah wrote:
> I have a tab-separated file that I think is already sorted on the first 3
> columns. Here is a 2-line sample in a file named foo:
> 
> chr10   60379   60380   10:60380-60380  T/T
> chr10   60379   60380   10:60380-60380  G/T
> 
> I try checking it with
> 
> sort -s -k1,1V -k2,2n -k3,3n -c foo
> 
> but the check fails:
> 
> sort: foo:2: disorder: chr10      60379   60380   10:60380-60380  G/T
> 
> If I sort it using the above key specification, it swaps the order of the
> lines:
> 
> sort -s -k1,1V -k2,2n -k3,3n foo
> 
> chr10   60379   60380   10:60380-60380  G/T
> chr10   60379   60380   10:60380-60380  T/T

Doesn't reproduce for me with Fedora's coreutils-8.23-11.fc22.x86_64:

$ printf
'chr10\t60379\t60380\t10:60380-60380\tT/T\nchr10\t60379\t60380\t10:60380-60380\tG/T\n'
| sort -s -k1,1V -k2,2n -k3,3n
chr10   60379   60380   10:60380-60380  T/T
chr10   60379   60380   10:60380-60380  G/T


> $ sort -s -k1,1V -k2,2n -k3,3n --debug foo
> sort: using ‘en_US.UTF-8’ sorting rules
> sort: leading blanks are significant in key 1; consider also specifying 'b'
> chr10>60379>60380>10:60380-60380>G/T

Awesome! Most bug reports fail to provide this important piece of
information.

You may want to follow the advice there of adding 'b' (as in -k1b,1V);
but as far as I can tell, it shouldn't be affecting the behavior you are
seeing (since your sample file didn't have leading whitespace).

> $ sort --version
> sort (GNU coreutils) 8.22
> $ more /etc/*-release
> ::::::::::::::
> /etc/oracle-release
> ::::::::::::::
> Oracle Linux Server release 7.1
> $ uname -r
> 3.8.13-68.1.2.el7uek.x86_64

I suspect that the most-likely culprit is a downstream vendor bug (it is
not the first time that vendor I18N patches have caused sort to
misbehave, where upstream is just fine).  For example,
https://bugzilla.redhat.com/show_bug.cgi?id=1148347
says that some builds of RHEL 7 coreutils 8.22 had a broken I18N patch
that calls strcoll() on too much of the subject line.  That would
certainly explain why your build seems affected, if the suffix 'G/T' vs.
'T/T' is being treated as significant, especially since you proved you
are using en_US.UTF-8 (and not LC_ALL=C).

But that's all the more I can point to - at this point, you'll have to
take it up with Oracle.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]