coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: wc enhancement possibility


From: Allan Chandler
Subject: RE: wc enhancement possibility
Date: Thu, 30 Jun 2016 09:34:15 +0000

Agreed, I'd be rather confused if the sum of "wc -l <file1" and "wc -l <file2" 
was not the same as "cat file1 file2 | wc -l". That's why I thought it would be 
better as a separate operation.

And I don't doubt you could generate a pipeline that would do the job (piping 
it through "awk '{print}'" also seems to do the trick for me) but my point was 
that counting things was the raison d'être of "wc", so it would be better added 
to _that_ program. You could just as well argue that "sed" shouldn't have an 
in-place editing option because you can do the same thing with a "mv" after the 
event :-)

I assume by flag option you meant that, rather than a new counting option like 
"--lines-including-partial-last", it may be better to have something that 
modified the behaviour of the existing "--lines", such as 
"--count-incomplete-last-line" or something like that. I have no issue with the 
mechanics so that sounds fine, my only desire was that there should be _some_ 
way to get the information from "wc".

Cheers.

-----Original Message-----
From: Pádraig Brady [mailto:address@hidden] 
Sent: Thursday, 30 June 2016 4:47 PM
To: Allan Chandler; address@hidden
Subject: Re: wc enhancement possibility

On 30/06/16 02:52, Allan Chandler wrote:
> Good arbitrary-time-of-day, people.
> 
> I helped a colleague out today with a "wc" problem they were having with line 
> counts when the final line of a file did not have a newline at the end of it.
> 
> Now this is technically not a bug since the doco explicitly states that "wc 
> --lines/-l" gives the count of newline characters, not the count of lines. 
> And, in any case, it could be argued that the definition of a line SHOULD be 
> "zero or more characters followed by a newline".
> 
> However, this has caused confusion before in that a non-terminated final line 
> COULD be considered a line, especially if you're just outputting the file.
> 
> I don't propose changing the behaviour of "--lines" since that would 
> result in chaos for a large number of scripts in the world currently 
> using it, and I don't wish to spend the rest of my life fighting off 
> affected parties, Omega-Man-against-the-zombies style, because of the 
> trouble I caused :-)
> 
> However, I wonder whether it would be worthwhile adding another option which 
> included a final non-terminated line, something like "--lines-all".
> 
> I've seen some "wc" suggestions turned down in the past 
> (https://www.gnu.org/software/coreutils/rejected_requests.html) but these 
> seem to generally be requests for things that other tools are better to 
> provide.
> 
> Keeping in mind the philosophy of UNIX's "a tool should do one thing and do 
> it well", and the fact that the purpose of "wC" is most definitely counting 
> things, it appears it may be a better fit in the "wc" program itself rather 
> than doing it as part of a pipeline.
> 
> Anyway, I'm really just raising it as a discussion point. Tell me what you 
> think...

Maybe.

Note one of the reasons wc -l doesn't count a non \n terminated line at end of 
file is so that counts are accurate for split files for example.

If we were to add an option it would be a flag type option rather than 
selecting a different mode.
But it mightn't be too much overhead to pre-process the data?
I.E. something like:

  wc-all-lines() { sed '$a\' | wc -l; }

cheers,
Pádraig



reply via email to

[Prev in Thread] Current Thread [Next in Thread]