coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sort complexity on nearly sorted input


From: Pádraig Brady
Subject: Re: sort complexity on nearly sorted input
Date: Sun, 12 Feb 2012 00:06:06 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0

On 02/11/2012 04:20 PM, Peng Yu wrote:
> Hi,
> 
> I assume the time complexity of 'sort' is log N, where N is the input size.
> 
> But I'm not familiar with 'sort' enough to tell the complexity of
> sorting a nearly sorted input. Suppose that I have a listed of N
> numbers, there only k numbers (k << N, say k=N/100) that are not in
> the correct position and all the other N-k numbers are in the correct
> position. Does anybody knows the time complexity of this case, or it
> is still log N?
> 

Well discounting all the memory, thread and data type management
that `sort` gives you, centrally there is this description
on sequential_sort():

   Use a recursive divide-and-conquer algorithm, in the style
   suggested by Knuth volume 3 (2nd edition), exercise 5.2.4-23.  Use
   the optimization suggested by exercise 5.2.4-10; this requires room
   for only 1.5*N lines, rather than the usual 2*N lines.  Knuth
   writes that this memory optimization was originally published by
   D. A. Bell, Comp J. 1 (1958), 75.

I searched for the above text out of interest, and noticed
this good overview of the operation of `sort`:
http://philip.greenspun.com/software/abstraction-filtration-comparison/sort/


It's trivial to empirically measure how sort behaves with different inputs:

$ for i in $(seq 4 7); do
>   seq -w $((10**$i)) > sorted.$i
>   shuf < sorted.$i > unsorted.$i
> done

$ export OMP_NUM_THREADS=1
$ export LANG=C

$ for i in $(seq 4 7); do
>   time sort sorted.$i > /dev/null
>   time sort unsorted.$i > /dev/null
> done

We can also compare python's in place "timsort",
given that the data set fits in RAM and timsort
is said to be especially good for mostly sorted data:

$ for type in sorted unsorted; do
>   time python -c "
> import sys
> a=sys.stdin.readlines()
> a.sort()
> sys.stdout.writelines(a)
> " > /dev/null < $type.7
> done

Results:
N         sorted   unsorted  py_sorted  py_unsorted
10000      0.015      0.021
100000     0.058      0.088
1000000    0.522      1.105
10000000   7.020     15.644     2.882        26.285

cheers,
Pádraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]