coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uniq - check specific fields


From: Pádraig Brady
Subject: Re: uniq - check specific fields
Date: Thu, 07 Feb 2013 17:34:25 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1

On 02/07/2013 05:13 PM, Assaf Gordon wrote:
Hello,

Attached is a proof-of-concept patch to add "--check-fields=N" to uniq, 
allowing uniq'ing by specific fields.
(Trying a different approach at promoting csplit-by-field [1] :) ).

It works just like 'check-chars' but on fields, and if not used, it does not 
affect the program flow.
===
     # input file, every whole-line is uniq
     $ cat input.txt
     A 1 z
     A 1 y
     A 2 x
     B 2 w
     B 3 w
     C 3 w
     C 4 w

     # regular uniq
     $ uniq -c input.txt
           1 A 1 z
           1 A 1 y
           1 A 2 x
           1 B 2 w
           1 B 3 w
           1 C 3 w
           1 C 4 w

     # Stop after 1 field
     $ uniq -c --check-fields 1 input.txt
           3 A 1 z
           2 B 2 w
           2 C 3 w

     # Stop after 2 fields
     $ uniq -c --check-fields 2 input.txt
           2 A 1 z
           1 A 2 x
           1 B 2 w
           1 B 3 w
           1 C 3 w
           1 C 4 w

     # Skip the first field and check 1 field (effectively, uniq on field 2)
     $ uniq -c  --skip-fields 1 --check-fields 1 input.txt
           2 A 1 z
           2 A 2 x
           2 B 3 w
           1 C 4 w

     # "--field" is convenience shortcut for skip&check fields
     $ uniq -c --field 2 input.txt
           2 A 1 z
           2 A 2 x
           2 B 3 w
           1 C 4 w
     $ uniq -c --field 3 input.txt
           1 A 1 z
           1 A 1 y
           1 A 2 x
           4 B 2 w
===

What do you think ?

Useful, but only a partial solution as discussed here:

http://lists.gnu.org/archive/html/bug-coreutils/2006-06/msg00211.html
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=5832

I.E. essentially this patch has been rejected before,
and being able to specify --key to uniq just like sort,
would be much preferred.

To avoid redundant coding it's always good to
touch base with the list first on ideas,
or search the bug database.

cheers,
Pádraig



reply via email to

[Prev in Thread] Current Thread [Next in Thread]