bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#7961: sort


From: Francesco Bettella
Subject: bug#7961: sort
Date: Wed, 2 Feb 2011 19:05:33 +0000
User-agent: KMail/1.9.4

thank you very much for your time. and sorry for the trouble.
if I understand this right, specifying 'b' in the start field spares me the 
fallback sort of the complete line. and this actually does the trick.
I remain a little in the dark regarding the dictionary vs. byte (POSIX vs. C) 
ordering. I've tried both on asd2 (without the 'b') with the same result. but 
I trust you on this one.

Francesco

P.S.: just got Gordon's reply. thank you for that.



On Wed February 2 2011 17:44, Eric Blake wrote:
> On 02/02/2011 05:42 AM, Francesco Bettella wrote:
> > hi,
> > I may have bumped into an undesired feature/bug of sort, which appears to 
be 
> > still present in the version 8.9 of coreutils.
> 
> Thanks for the report.  However, this is a feature, and not a bug, of sort.
> 
> > 
> > I'm issuing the following sort commands (see attached files):
> > 
> > [prompt1] > sort -k 1.4,1n asd1 > asd1.sorted
> > 
> > [prompt2] > sort -k 2.4,2n asd2 > asd2.sorted
> 
> If I'm correct, asd1 and asd2 have the same contents, except that you
> have swapped columns 1 and 2 between the two and resorted the lines.
> And your desired goal is that the output matches asd1.sorted, again with
> the columns swapped for asd2.sorted.
> 
> > 
> > the first one works as I would expect, the second one doesn't.
> 
> Let's examine why:
> 
> $ head -3 asd1 | sort -k 1.4,1n --debug
> sort: using `en_US.UTF-8' sorting rules
> sort: leading blanks are significant in key 1; consider also specifying `b'
> chr>coding_gene
>    ^ no match for key
> _______________
> chr1>PRAMEF1
>    _
> ____________
> chr1>PRAMEF4
>    _
> ____________
> $ head -3 asd1 | LC_ALL=C sort -k 1.4,1n --debug
> sort: using simple byte comparison
> sort: leading blanks are significant in key 1; consider also specifying `b'
> chr>coding_gene
>    ^ no match for key
> _______________
> chr1>PRAMEF1
>    _
> ____________
> chr1>PRAMEF4
>    _
> ____________
> 
> In both cases, when there is no match for a key but numeric sorting was
> requested, then that line sorts first; meanwhile, you get the fallback
> sort of the complete line after the first key has been sorted, so that
> the end result matches asd1.sorted whether you use the C locale or
> dictionary sorting.
> 
> But notice that warning about not using -b, and how it affects asd2 (and
> also, how the difference in dictionary vs. byte-ordering plays a role in
> the secondary sorting):
> 
> $ head -3 asd2 | sort -k 2.4,2n --debug
> sort: using `en_US.UTF-8' sorting rules
> sort: leading blanks are significant in key 1; consider also specifying `b'
> coding_gene>chr
>               ^ no match for key
> _______________
> PRAMEF1>chr1
>           ^ no match for key
> ____________
> PRAMEF4>chr1
>           ^ no match for key
> ____________
> $ head -3 asd2 | LC_ALL=C sort -k 2.4,2n --debug
> sort: using simple byte comparison
> sort: leading blanks are significant in key 1; consider also specifying `b'
> PRAMEF1>chr1
>           ^ no match for key
> ____________
> PRAMEF4>chr1
>           ^ no match for key
> ____________
> coding_gene>chr
>               ^ no match for key
> 
> But when you add -b (note, b is the one option you have to add to the
> start field, since it affects start and end fields specially; all other
> options can be added to start, end, or both, and affect the entire key):
> 
> $ head -3 asd2 | sort -k 2.4b,2n --debug
> sort: using `en_US.UTF-8' sorting rules
> coding_gene>chr
>                ^ no match for key
> _______________
> PRAMEF1>chr1
>            _
> ____________
> PRAMEF4>chr1
>            _
> ____________
> $ head -3 asd2 | LC_ALL=C coreutils/src/sort -k 2.4b,2n --debug
> coreutils/src/sort: using simple byte comparison
> coding_gene>chr
>                ^ no match for key
> _______________
> PRAMEF1>chr1
>            _
> ____________
> PRAMEF4>chr1
>            _
> ____________
> 
> That is, your expectations were insufficient - without telling sort
> enough additional information, sort correctly followed what you told it
> to do, but what you told it was not what you meant.  And the --debug
> option is your [new] friend :)
> 
> -- 
> Eric Blake   address@hidden    +1-801-349-2682
> Libvirt virtualization library http://libvirt.org
> 
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]