[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: comm: summary patch
From: |
Bob Proulx |
Subject: |
Re: comm: summary patch |
Date: |
Tue, 12 Jul 2005 13:39:26 -0600 |
User-agent: |
Mutt/1.5.9i |
Paul Eggert wrote:
> Andrew Stribblehill writes:
> > It can sometimes be coded with awk, sure:
> >
> > #! /bin/sh
> > # usage: commsum <file(s)>
> >
> > awk '
> > BEGIN {t[0]=0; t[1]=0; t[2]=0}
> > {match($0,/^\t*/); t[RLENGTH]++}
> > END {printf "%d\t%d\t%d\n",t[0],t[1],t[2]}
> > ' "$@"
This is off topic but as a shell script that does nothing but calls
awk I would make that a pure awk script. There is no need for the
shell to be there at all. '#!/usr/bin/awk' or '#!/usr/bin/env awk' or
whatever and the appropriate script changes.
> > However, this presumes that the input has no leading tabs in it.
>
> Yes, that's a problem: the output of comm is ambiguous. But how about
> if we solve this more-general problem instead if your particular one?
> That will let "comm" be useful in other situations.
>
> One way to solve the problem is by having an option that lets "comm"
> quote its output in some way, so that the output is not ambiguous.
> For example, it might quote leading tabs using "\t" and backslashes
> using "\\". Or perhaps you can think of a better approach.
I admit to being skeptical that such a quoting is really useful. It
would mean that something would have to read that quoting.
> > there's no way to avoid that, short of preprocessing:
>
> How about this?
>
> echo $(comm -23 f1 f2) $(comm -13 f1 f2) $(comm -12 f1 f2)
>
> Admittedly it's not as efficient as one might like, but is there
> really much of an efficiency issue here?
Personally I think a multipass approach would be fine too. Of course
as soon as someone makes an argument for reading directly from a pipe
then a multipass is problematic. But is there really a case for
reading from a pipe here? I think we would be optimizing for a 0.05%
use case.
By the way... I think you meant to use wc -l here too, right?
seq 10 19 > f1
seq 16 25 > f2
echo $(comm -23 f1 f2 | wc -l) $(comm -13 f1 f2 | wc -l) $(comm -12 f1 f2 |
wc -l)
6 6 4
And of course any single value is very simple.
comm -23 f1 f2 | wc -l
6
> > Does anyone else agree with me, or shall I just crawl back under my
> > rock? ;)
>
> Let's see whether anyone else chimes in.
>
> This email exchange is archived, so perhaps someone will read it in
> 2010 and say "Hey, Andrew was right!" and fix things....
I am skeptical of how much use an option like this would really get.
And it is pretty straight-forward to code a similar solution
presently, the only disadvantage being that it is multipass.
Bob