coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: how to speed up sort for partially sorted input?


From: Carl Edquist
Subject: Re: how to speed up sort for partially sorted input?
Date: Wed, 25 Aug 2021 08:09:26 -0500 (CDT)
User-agent: Alpine 2.23.2 (DEB 496 2020-07-19)

On Wed, 25 Aug 2021, arthur200126@gmail.com wrote:

The current sort algorithm used by coreutils sort(1) is a mergesort, which unfortunately does not pick up cues on partially-sorted input. A relatively well-known algorithm that picks up partial sort order is timsort [...]

 [1]: https://en.wikipedia.org/wiki/Timsort

If i read correctly, timsort takes advantage of input sections that are already sorted ("runs"), by merge-sorting consecutive runs.

This is a bit different than the partially sorted input originally described in this thread, where you are sorting on (in this case) column 1, then column 2, but it is known that the input is already sorted according to column 1.

In the timsort case, if you have two consecutive runs, A and B, then you have

        i < j  --> A[i] < A[j]
and
        i < j  --> B[i] < B[j]

but there is no guarantee that

        A[i] < B[j]

On the other hand, the scenario for this thread is the opposite. If you have two consecutive chuncks, A (column 1 = A), and B (column 1 = B), then you always have

        A[i] < B[j]

but within chunk A the lines are unsorted, as within chunk B.

...

So, while a timsort algorithm might be an interesting alternative, it doesn't seem it will improve things for this particular case.

Carl



reply via email to

[Prev in Thread] Current Thread [Next in Thread]