coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFE: head,tail: -z, --zero-terminated


From: Pádraig Brady
Subject: Re: RFE: head,tail: -z, --zero-terminated
Date: Fri, 8 Jan 2016 21:46:59 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 08/01/16 19:20, Assaf Gordon wrote:
> Hello Pádraig and all,
> 
> Regarding the "wc", I think the following is missing:
> ===
> diff --git a/src/wc.c b/src/wc.c
> index ea2c20f..0e9a849 100644
> --- a/src/wc.c
> +++ b/src/wc.c
> @@ -651,7 +651,7 @@ main (int argc, char **argv)
>     print_linelength = false;
>     total_lines = total_words = total_chars = total_bytes = max_line_length = 
> 0;
>   
> -  while ((optc = getopt_long (argc, argv, "clLmw", longopts, NULL)) != -1)
> +  while ((optc = getopt_long (argc, argv, "clLmwz", longopts, NULL)) != -1)
>       switch (optc)
>         {
>         case 'c':
> ===

Heh I was rushing that last one.

Yes you're right.
The test didn't fail due to an existing issue in the test,
which is fixed up with:

diff --git a/tests/misc/wc-files0.sh b/tests/misc/wc-files0.sh
index 6220e72..c117e81 100755
--- a/tests/misc/wc-files0.sh
+++ b/tests/misc/wc-files0.sh
@@ -33,19 +33,19 @@ EOF

 compare exp out || fail=1

-if test "$fail" = ''; then
+if ! test "$fail" = 1; then
   # Repeat the above test, but read the file name list from stdin.
   rm -f out
   wc --files0-from=- < names > out || fail=1
   compare exp out || fail=1
 fi

-if test "$fail" = ''; then
+if ! test "$fail" = 1; then
   # Repeat the above test, but output NULs instead of newlines
   rm -f out
   tr '\n' '\0' < exp > zexp || framework_failure_
   wc -z --files0-from=- < names > out || fail=1
-  compare exp out || fail=1
+  compare zexp out || fail=1
 fi

> Also,
> If I understand what the patch does, it only changes the delimiter of the 
> output line,
> but still treats NL as the input delimiters ?
> 
> e.g. the following gives 2 instead of 1:
> 
>      $ printf "a\nb\nc\x00" | ./src/wc -z -l
>      2
> 
> I can understand if the goal is to be strict POSIX-like (and always count 
> NLs, not logical lines),
> but this can lead to counter-intuitive results when combined with other 
> programs.
> 
> Since this returns 10:
> 
>      $ cat FOO | head -n10 | wc -l
> 
> I would naively expect this to return 10 as well:
> 
>      $ cat null-terminated-FOO | head -z -n10 | wc -z -l
> 
> And similarly:
> 
>      $ cat FOO | tr '\n' '\0' | head -z -n10 | wc -z -l
> 
> but currently it will return 0 (or as many embedded NLs that are in the 
> input, but not 10).
> 
> What do you think?

Yes, wc is "special" in that it outputs file names
and processes those files line by line.
I was just considering the former case (with \n in file names).

Now these cases are separate really.
I.E. you could definitely want to handle \n in file names
while treating input as standard \n delimited lines.
So we don't want -z to do both at least.

What we could do is to unconditionally escape the file name
if it contains a \n.
That would not have backwards compat issues as a \n in a file name
would currently break programmatic parsing anyway.

As to the question of having -z control wc input processing,
given that it's very text oriented with assumptions
on word and line boundaries, I'm wondering is NUL input
processing useful for wc at all? I'll drop it for now.

cheers,
Pádraig.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]