Re: how to speed up sort for partially sorted input?

coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: how to speed up sort for partially sorted input?

From:	Carl Edquist
Subject:	Re: how to speed up sort for partially sorted input?
Date:	Wed, 11 Aug 2021 05:29:08 -0500 (CDT)
User-agent:	Alpine 2.23.2 (DEB 496 2020-07-19)

On Tue, 10 Aug 2021, Kaz Kylheku (Coreutils) wrote:

On 2021-08-07 17:46, Peng Yu wrote:
 Hi,

 Suppose that I want to sort an input by column 1 and column 2 (column
 1 is of a higher priority than column 2). The input is already sorted
 by column1.

 Is there a way to speed up the sort (compared with not knowing column
 1 is already sorted)? Thanks.
Since you know that colum 1 is sorted, it means that a sequential scanof the data will reveal chunks that have the same colum1 value.
You just have to read and separate these chunks, and sort each oneindividually by column 2.


Neat observation.

You could do that tersely in awk by piping each chunk to a separate sortprocess, like:


        awk '
            c1 != $1 { close(sort); c1 = $1 }
            { print | sort }
        ' sort="sort -k2,2" partially-sorted-input.txt

In theory, that would bring the sorting work down from ~ O(n * log(n)) to~ O(n * log(n/m)) (for a partially-sorted file with n lines and m

column-1 chunks of equal size).

But the overhead of starting a new sort process for each chunk is likelygoing to outweigh that advantage. In the end, just sorting the whole fileat once (despite column 1 already being sorted) is still likely to befaster.

(With just a bit more work, you can do all your sorting in a single awkprocess too (without piping out to sort), but i think you'll still bedisappointed with the performance compared to a single sort command.)



Carl

[Prev in Thread]

Current Thread

[Next in Thread]

Re: how to speed up sort for partially sorted input?, Kaz Kylheku (Coreutils), 2021/08/11
- Re: how to speed up sort for partially sorted input?, Kaz Kylheku (Coreutils), 2021/08/11
- Re: how to speed up sort for partially sorted input?, Carl Edquist <=
  - Re: how to speed up sort for partially sorted input?, Peng Yu, 2021/08/11
    - Re: how to speed up sort for partially sorted input?, Kaz Kylheku (Coreutils), 2021/08/11
    - Re: how to speed up sort for partially sorted input?, Peng Yu, 2021/08/11
    - Re: how to speed up sort for partially sorted input?, Carl Edquist, 2021/08/12
- Re: how to speed up sort for partially sorted input?, Kaz Kylheku (Coreutils), 2021/08/12
- Re: how to speed up sort for partially sorted input?, Carl Edquist, 2021/08/13

Prev by Date: Re: how to speed up sort for partially sorted input?
Next by Date: Re: how to speed up sort for partially sorted input?
Previous by thread: Re: how to speed up sort for partially sorted input?
Next by thread: Re: how to speed up sort for partially sorted input?
Index(es):
- Date
- Thread