bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new coreutil? shuffle - randomize file contents


From: James Youngman
Subject: Re: new coreutil? shuffle - randomize file contents
Date: Mon, 23 May 2005 17:35:27 +0100
User-agent: Mutt/1.3.28i

Davis Houlton writes:-


> I recently had to write a shuffle utility for a personal project and
> was wondering if it would make a canidate for the coreutils
> suite. It seems like the kind of utility the toolbox could use
> (maybe under section 3. Output of entire files).

This behaviour was proposed a few months ago as a new option to
"sort", and there were objections around the ideas of keeping the
shuffled sort stable (i.e. that lines with the same key should appear
in groups in the shuffled output) and of repeatability (e.g. giving a
'random seed' to ensure output is reproducible[*]).  Much discussion
followed and eneded up with many people agreeing that this behaviour
properly belonged in a a separate program.

So, I think that "shuffle" is a good idea.

[*] I'm against the 'random seed' idea since the the number of
possible permutations of the output is so large for most reasonable
inputs that numbers as low as 2^32 are way too small.  That makes
parsing and making use of the seed quite tricky and in all likelihood
not worth the bother.


> It's a fairly simple program, and while there is room for
> optimization, works fairly quickly enough on my hardware.

Does it produce all possible outputs with equal probability?

> SYNOPSIS
>        shuffle [--help] [FILE]...

I would suggest the addition of the following options 

        --null, -0, -z
                Accept and produce records terminated by the NUL
                character (in ASCII, this is 0) instead of newline.
                This can be used in conjunction with "find -print0"
                or with "xargs -0".   

        -o file
                Write the shuffled output data to the named file.
        
The reason for offering "-z" there as well is that "sort" has a "-z"
option that does this.   


>        Create a random multimedia playlist based on directory contents:
>               find /path/to/media -name *.ogg | shuffle > playlist.pls

Does that generate a playlist in the right format?  Based on a look at
http://gonze.com/playlists/playlist-format-survey.html#PLS it seems to
me like it's better to indicate that the output file is 'plauylist.m3u'.


> EXIT STATUS
>        shuffle returns a zero upon succesful completion; a one upon failing to
>        read a file and a two upon failing to write a file.

I realise that many writing style guides suggest that numbers less
than six or so should be represented as words rather than numerals,
but I found the paragraph above would have been more useful if
formatted a bit more like a table:-

   .SH EXIT STATUS
   .B shuffle
   exits with the following status:
   .nf
   0 if it succeeds
   1 if it fails to open or read an input file
   2 if it fails to write output data or fails to open the output file
   3 for command-line usage errors
   .fi
   The program can also exit with numerically greater status values if
   other errors occur, but most of these are operating-system
   dependent.


> Please let me know if this would be considered an appropriate
> addition, and what the next step might be.  I have used GNU utils
> for decades now, and am happy at the opportunity to finally have a
> chance to give back.

I would find this program useful.

Regards,
James.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]