coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Feature request: uniq - Adding regex option to consolidate lines ?


From: Pádraig Brady
Subject: Re: Feature request: uniq - Adding regex option to consolidate lines ?
Date: Tue, 15 Oct 2013 10:30:59 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 10/14/2013 07:56 AM, Javier Barroso wrote:
> Hello,
> 
> Currently uniq can distinguish lines with several criterial (-f, -i,
> -s, -w ). Maybe another criteria could be added (-r / --regex?)
> 
> The idea is consider a line which match with regex (being regex the
> argument passed to "-r" option) duplicate of another line that matches
> with that regex too.
> 
> For the moment I only have a perl script which made that task
> ("uniqregex regex" would be the sypnosis) (only for show you what I am
> saying):
> 
> #!/usr/bin/perl -w
> use strict;
> 
> my $regex=$ARGV[0];
> shift;
> my $count=0;
> # Read stdin or ARGV[1] into the implicit variable
> while (<>)
> {
>     if (/$regex/)
>     {
>         $count=$count+1;
>     }
>     else
>     {
>         $count=0;
>     }
>     if ($count < 2)
>     {
>         print
>     }
> }
> 
> I see you have regex code in nl program, so maybe adding it to uniq is
> not a bad idea.
> 
> The first use I gave to that script is to consolidate java exception,
> where a large number of "[[:space:]]*at"  instances appear after than
> the error message. So I can tail catalina.out (from tomcat) and read
> error messages
> 
> I am refering to:
> 
> java.lang.NumberFormatException: 12
>         at java.lang.Integer.parseInt(Compiled Code)
>         at java.lang.Integer.parseInt(Integer.java:390)
>         at excepcion.ExcepcionApp.main(ExcepcionApp.java:8)

While at first glance this might seem like a useful addition to uniq,
I'm not so sure.

So you're suggesting to use a regex to define the match function.
The same argument would apply to join(1) and comm(1) for example.
However this would be a boolean match function rather than something
having a defined order and so would not apply to sort(1).
This asymmetry between these coupled utilities suggests limited
use cases for separately ordered data.
Also the actual uniq specific logic that could be leveraged here
is quite small (as demonstrated by your simple script).
Also a more general technique of Decorate-Sort-Undecorate could be used
here to initially tag data (with sed for example) based on a regexp,
and manipulate the tagged data with coreutils.
Consequently I'd be against incorporating such regex functionality.

thanks,
Pádraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]