bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#8081: sort -z and shuf -z don't work


From: Eric Blake
Subject: bug#8081: sort -z and shuf -z don't work
Date: Sat, 19 Feb 2011 06:57:24 -0700
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.7

On 02/19/2011 12:24 AM, Harald Dunkel wrote:
> Package: coreutils
> Version: 8.10
> 
> 
> Hi folks,
> 
> According to the man page "sort -z" and "shuf -z" are supposed to
> "end lines with 0 byte, not newline". This doesn't work. Example:

Thanks for the report, however this is not a bug.

sort -z implies that both input and output will be handled on
NUL-terminator boundaries, instead of the usual newline-terminator
boundaries.

> 
> % ( echo 1; echo 2; echo 3 ) | tac | sort -z

Up to this point, there are no NUL terminators in the input, so sort
sees only a single record, and there's nothing to sort.

> | xargs -0 -L 1 echo xxx
> xxx 3
> 2
> 1

Likewise, xargs only gets a single record, explaining why you only get a
single xxx.

> 
> There are 3 line on input, so there should be 3 lines with "xxx" on
> output.

No, sort only saw one NUL-terminated line on input (and not even that,
since you didn't provide a NUL-terminator).

> If I omit the -z, then it works:
> 
> % ( echo 1; echo 2; echo 3 ) | tac | sort | xargs -L 1 echo xxx
> xxx 1
> xxx 2
> xxx 3

Of course, because then you are using newline termination.

> 
> Please note that sort's input stream is not zero-terminated. "tac"
> doesn't support this option.

Maybe we should modify tac to add the -z option.  Would you care to
write a patch?

> sort(1) doesn't mention such an
> assumption, either. Obviously there are many more tools with
> this restriction.

Sort _did_ mention that the effect of -z is to handle lines based on NUL
termination - which implies both input and output.  Sort does NOT
convert between line termination styles, nor should it - since the whole
point of NUL-terminated records is that newlines can be embedded within
a record (matching the fact that you can sort filenames with embedded
newlines).  Converting line endings from newline to NUL or from NUL to
newline would give ambiguous output from sort's perspective; if you need
the conversion, then it should be done before sort's input or after
sort's output.

> 
> Instead of mixing input and output options I would suggest to
> introduce 2 new tools "nl2zero" and "zero2nl". Sample implementation:
> 
> % alias nl2zero='tr \\n \\0'
> % alias zero2nl='tr \\0 \\n'

Why should we add new tools, when you've already proven that a new alias
or simple shell function using existing tools (tr) can already do what
you require?

> % ( echo 1; echo 2; echo 3 ) | tac | nl2zero | sort -z | xargs -0 -L 1 echo 
> xxx
> xxx 1
> xxx 2
> xxx 3

If anything, the only thing I've gotten from this post is that it would
be nice to teach tac about -z:

$ printf '1\0002\0003\000' | tac -z | sort -z | xargs -0 -L 1 echo xxx
xxx 1
xxx 2
xxx 3

-- 
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]