coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: how to speed up sort for partially sorted input?


From: Kaz Kylheku (Coreutils)
Subject: Re: how to speed up sort for partially sorted input?
Date: Thu, 12 Aug 2021 18:53:03 -0700
User-agent: Roundcube Webmail/0.9.2

On 2021-08-11 11:58, Peng Yu wrote:
On Wed, Aug 11, 2021 at 1:43 PM Kaz Kylheku (Coreutils)
<962-396-1872@kylheku.com> wrote:

On 2021-08-11 05:03, Peng Yu wrote:
> On Wed, Aug 11, 2021 at 5:29 AM Carl Edquist <edquist@cs.wisc.edu>
> wrote:
>> (With just a bit more work, you can do all your sorting in a single
>> awk
>> process too (without piping out to sort), but i think you'll still be
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> disappointed with the performance compared to a single sort command.)
>
> Yes, this involves many calls of the coreuils' sort, which is not
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

No, not this last remark, which is about "in a single awk process".

I know there is one awk process. I don't understand why you mentioned it.

(That's why.)

> efficient. Would it make sense to add an option in sort so that sort
> can sort a partially sorted input in one shot.

IF you're willing to use GNU Coreutils instead of Unix, you probably
have

I don't think using awk is efficient. I am program a number awk
programs for simple transforming the input and tested it, in general,
it is slower than the equivalent python code, let along C code.

You can talk about doing most of the work in awk below. I don't think
that make sense. Having coreutils' sort be able to do a partial sort
is a more reasonable solution.

The solution doesn't exist today, whereas that Gawk program should
run even in ten year old installations.

For the solution to be useful, it only has to beat the actual sort
which you have available today, not some imagined version of sort
that isn't yet available.

I'm assuming that you're posting here because you have some real
problem to solve, not just to postulate chrome plating for Coreutils,
and so that a working program today would be of use to you.

A vast amount of useful computing is being done with tools and
approaches that are not thoroughly optimized.

Sometimes those approaches usefully prototype a solution which
is later optimized or replaced; in the meantime, that solution
serves a useful purpose.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]