bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#7961: sort


From: Eric Blake
Subject: bug#7961: sort
Date: Wed, 02 Feb 2011 10:44:00 -0700
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.7

On 02/02/2011 05:42 AM, Francesco Bettella wrote:
> hi,
> I may have bumped into an undesired feature/bug of sort, which appears to be 
> still present in the version 8.9 of coreutils.

Thanks for the report.  However, this is a feature, and not a bug, of sort.

> 
> I'm issuing the following sort commands (see attached files):
> 
> [prompt1] > sort -k 1.4,1n asd1 > asd1.sorted
> 
> [prompt2] > sort -k 2.4,2n asd2 > asd2.sorted

If I'm correct, asd1 and asd2 have the same contents, except that you
have swapped columns 1 and 2 between the two and resorted the lines.
And your desired goal is that the output matches asd1.sorted, again with
the columns swapped for asd2.sorted.

> 
> the first one works as I would expect, the second one doesn't.

Let's examine why:

$ head -3 asd1 | sort -k 1.4,1n --debug
sort: using `en_US.UTF-8' sorting rules
sort: leading blanks are significant in key 1; consider also specifying `b'
chr>coding_gene
   ^ no match for key
_______________
chr1>PRAMEF1
   _
____________
chr1>PRAMEF4
   _
____________
$ head -3 asd1 | LC_ALL=C sort -k 1.4,1n --debug
sort: using simple byte comparison
sort: leading blanks are significant in key 1; consider also specifying `b'
chr>coding_gene
   ^ no match for key
_______________
chr1>PRAMEF1
   _
____________
chr1>PRAMEF4
   _
____________

In both cases, when there is no match for a key but numeric sorting was
requested, then that line sorts first; meanwhile, you get the fallback
sort of the complete line after the first key has been sorted, so that
the end result matches asd1.sorted whether you use the C locale or
dictionary sorting.

But notice that warning about not using -b, and how it affects asd2 (and
also, how the difference in dictionary vs. byte-ordering plays a role in
the secondary sorting):

$ head -3 asd2 | sort -k 2.4,2n --debug
sort: using `en_US.UTF-8' sorting rules
sort: leading blanks are significant in key 1; consider also specifying `b'
coding_gene>chr
              ^ no match for key
_______________
PRAMEF1>chr1
          ^ no match for key
____________
PRAMEF4>chr1
          ^ no match for key
____________
$ head -3 asd2 | LC_ALL=C sort -k 2.4,2n --debug
sort: using simple byte comparison
sort: leading blanks are significant in key 1; consider also specifying `b'
PRAMEF1>chr1
          ^ no match for key
____________
PRAMEF4>chr1
          ^ no match for key
____________
coding_gene>chr
              ^ no match for key

But when you add -b (note, b is the one option you have to add to the
start field, since it affects start and end fields specially; all other
options can be added to start, end, or both, and affect the entire key):

$ head -3 asd2 | sort -k 2.4b,2n --debug
sort: using `en_US.UTF-8' sorting rules
coding_gene>chr
               ^ no match for key
_______________
PRAMEF1>chr1
           _
____________
PRAMEF4>chr1
           _
____________
$ head -3 asd2 | LC_ALL=C coreutils/src/sort -k 2.4b,2n --debug
coreutils/src/sort: using simple byte comparison
coding_gene>chr
               ^ no match for key
_______________
PRAMEF1>chr1
           _
____________
PRAMEF4>chr1
           _
____________

That is, your expectations were insufficient - without telling sort
enough additional information, sort correctly followed what you told it
to do, but what you told it was not what you meant.  And the --debug
option is your [new] friend :)

-- 
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]