bug-textutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Diff to fix wc's definition "word" to match other GNU tools.


From: Bob Proulx
Subject: Re: Diff to fix wc's definition "word" to match other GNU tools.
Date: Wed, 18 Apr 2001 23:02:24 -0600

> Below is a diff to modify 'wc' in order to fix wc's concept of a "word" to 
> match
> the definition of "word" used by egrep(1), regex(7), regcomp(3), perl, and 
> other
> tools. Specifically, the definitition of a "word" is a sequence of 
> alphanumeric or
> underscore characters.

Thanks for the report.  Unfortunately for your purposes that is
exactly what wc is defined to do and can't be changed.

> The wc tool, however, considers anything other than \n, \r, \t, \l, \v, and 
> <space>
> to be valid word-characters. For example, the text,
> "The-rain+in(Spain)falls*mainly{on}the......plain", is only one word, 
> according to
> `wc'. The tools mentioned above will report nine words.
> 
> Good news! The fix is rediculously simple -- only a few lines! Here is the 
> output
> of cvs diff on the file, "textutils-XXX/src/wc.c" :

The wc program has been around for 25+ years and is part of UNIX
standards.  If the output were changed it would break many programs
which rely upon it being the way it is.  For example, many shar(1)
programs use wc as a check sum.  We all realize that wc is a terrible
integrity check and bit errors would pass through a wc check
unnoticed.  But just the same we don't want to break those programs
that rely upon the standard output.  Those of us who are used to a
stable operating system that does not change core functionality too
often would get apoplexy.

Check out the online standards documentation at:

  http://www.unix-systems.org/single_unix_specification_v2/xcu/wc.html

    "The wc utility considers a word to be a non-zero-length string of
    characters delimited by white space."

Thanks though...

Bob



reply via email to

[Prev in Thread] Current Thread [Next in Thread]