coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

potential feature addition to coreutils' sort.c: print at most N lines


From: James Dowdell
Subject: potential feature addition to coreutils' sort.c: print at most N lines
Date: Sun, 3 Mar 2013 09:32:38 -0800

I'm considering writing a patch for sort.c to add a new feature, related to a stackoverflow inquiry I wrote (http://stackoverflow.com/questions/14882897/what-standard-commands-can-i-use-to-print-just-the-first-few-lines-of-sorted-out).

This would be my first patch, and this is my first time messaging a gnu list; apologies if I'm "doing it wrong."

I use GNU sort a lot, and routinely find myself in the situation of executing, e.g.:

$ sort ... | head -n 1000

This can be very unnecessarily slow when the input is huge, because sort does a lot of work that head throws away.

I propose a new parameter, "-H, --head=NLINES", which has sort only print at most NLINES of output.  More than just a filter at the end like | head, it would avoid unnecessary sorting on more than NLINES of output.

I want to know the procedure for submitting a patch, and the likelihood that such a patch would even be considered, before I spend time to parse the whole sort.c file and propose a complete and rigorous solution (which would be analogous to submitting the patch).  From a quick glance at the source, my current strategy would be to alter the merge nodes when this parameter is set so that the number of lines listed per node is clamped to NLINES.  While less efficient than an ideal solution, it would be more efficient than what's currently in place, and has the benefits of minimal code edits and negligible negative performance impact on mainstream use when the parameter is not passed.

All feedback welcome, thank you.

-James

reply via email to

[Prev in Thread] Current Thread [Next in Thread]