coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uniq with sort-like "--key" support


From: Pádraig Brady
Subject: Re: uniq with sort-like "--key" support
Date: Tue, 12 Feb 2013 01:50:30 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1

On 02/12/2013 01:31 AM, Assaf Gordon wrote:
Hello,

I'd like to offer a proof-of-concept patch for adding sort-like "--key" support 
for the 'uniq' program, as discussed here:
   http://lists.gnu.org/archive/html/bug-coreutils/2006-06/msg00211.html
and in several other threads.

The patch involves few core changes:
1. All key-related functions were copied as-is from "sort.c", and put in a separate file 
(uniq_sort_common.h). In theory, those could extracted later on to file that will be used by both sort 
and uniq. At the moment, it's a hodge-podge of copy&paste, including code that's not relevant to 
uniq (like "reverse").

2. The function "check_files" was modified to convert "struct linebuffer" (used by uniq) 
to "struct line" (used by sort's functions)

and then

3. The "different" function was modified to call sort's "keycompare" function.

4. In main(), the key argument passing was copied from 'sort', and some code was added to 
adapt previous options (e.g. skip-fields/skip-chars/check-chars) to internal "struct 
keyfield" .

The result is that uniq can now do:
===
$ printf "A 1\nA 2\nB 2\n" | ./src/uniq -k1,1
A 1
B 2
$ printf "A 1\nA 2\nB 2\n" | ./src/uniq -k2,2
A 1
A 2
===

Most (but not all) of the existing tests pass.
New tests to demonstrate the new possibilities have been added to 
'tests/misc/uniq-key.pl', try with:
  make check TESTS=tests/misc/uniq-key SUBDIRS=.

I think that most of the keycomparison functions (like 
numeric/general-numeric/month/version/skip-blanks) would "just work", though I 
haven't tested it thoroughly yet.


Comments are welcomed,
-gordon


I'm not going to look at it this week, but thank you!
Consolidating the field processing in a central place
is really good, and it can then be enhanced in future
to support multibyte chars etc.

cheers,
Pádraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]