coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH]: uniq: add "--group" option


From: Assaf Gordon
Subject: Re: [PATCH]: uniq: add "--group" option
Date: Thu, 21 Feb 2013 10:42:23 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.4) Gecko/20120510 Icedove/10.0.4

Hello Pádraig,

Pádraig Brady wrote, On 02/20/2013 08:47 PM:
> On 02/20/2013 06:44 PM, Assaf Gordon wrote:
>> Hello,
>>
>> Attached is a suggestion for "--group" option in uniq, as discussed here:
>>    http://lists.gnu.org/archive/html/coreutils/2011-03/msg00000.html
>>    http://lists.gnu.org/archive/html/coreutils/2012-03/msg00052.html
>>
>> The patch adds two parameters:
>>        --group=[method]  separate each unique line (whether duplicated or 
>> not)
>>                          with a marker.
>>                          method={none,separate(default),prepend,append,both)
>>        --group-separator=SEP   with --group, separates group using SEP
>>                          (default: empty line)
>>
> 
> --group-sep is probably overkill.
> I'd just use \n or \0 if -z specified.
> 
OK.

> As for separation methods I'd just go with what we have for
> --all-repeated (but remove 'none' which wouldn't be useful with --group),
> as we've never had requests for anything else. so:
> --group={prepend, separate(default)}
> 

I'd like to have at least "append" or "both", for the added convenience of 
downstream analysis.
It's obviously a "nice-to-have" and not "must-have" feature, and can be 
implemented in other ways, but knowing that there will always be a terminating 
marker *after* a group (even the last group) makes downstream processing code 
simpler.

Typical example:
 $ cat INPUT | uniq --group=append | \
      awk '$0!="" { ## item in the group, collect it }
           $0=="" { ## end of group, do something }'

Without the final group marker, any downstream code will require two points of 
"group processing": when a marker is found, and at EOF.
Something like:

 $ cat INPUT | uniq --group=append | \
      awk '$0!="" { ## item in the group, collect it }
           $0=="" { ## end of group, do something }
           END { ## end of last group, do something, duplicated code }'

Similar reason for having "both", as it ensures there I can put any special 
initialization code in the group-marker case, and doesn't need to duplicate it 
in a separate 'BEGIN{}' clause (Of course, this doesn't have to be awk - can be 
perl/python/ruby/whatever that will do downstream processing).

I realize it's not a "make-or-break" feature - but if we're trying to make text 
processing easier, I believe "append/both" makes it even easier.


> So on to operation...
> 
>> And it behaves "as expected":
>> ===
>> $ printf "a\na\na\nb\nc\nc\n" | ./src/uniq --group-sep="--" --group=separate
> 
> The above isn't that useful and could be done with sed.
> 
I assume you're specifically referring to the "group-sep" part - then OK.


> Supporting -u or -d with --group wouldn't be useful either really.
> It's probably most consistent to just disallow those combinations.
> 

Just to be clear on the reasoning: because with "-u" and "-d", each *line* is 
implicitly a separate group, there's no apparent utility for an end-of-group 
marker.

I guess it's true from a technical POV - but again, for downstream analysis 
convenience it's nice to have a fixed end-of-group marker.
I could use the same downstream script (which expects end-of-group markers) 
with uniq, whether I used "-d" or "-u" or nothing at all.

What do you think?
 -gordon







reply via email to

[Prev in Thread] Current Thread [Next in Thread]