[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] split: --chunks option
From: |
Chen Guo |
Subject: |
Re: [PATCH] split: --chunks option |
Date: |
Sun, 3 Jan 2010 12:37:13 -0800 (PST) |
Hi all, hope everyone had happy holidays.
Here's the patch in its entirety. Let me know if anything's not
satisfactory.
I should note that I went easy on the tests because the other
split tests didn't seem all too comprehensive themselves. Please
let me knowif I need to be more exhaustive.
>From fb783060ece188fdbcd805381d02eb3b0477d25a Mon Sep 17 00:00:00 2001
From: Chen Guo <address@hidden>
Date: Sun, 3 Jan 2010 11:16:09 -0800
Subject: [PATCH] split: divide file into equal sized chunks; add -r and -t
options.
Extend --bytes and --lines to divide file into N equal pieces, or
extract Kth of N said pieces. Add -n/--number alias for BSD
compatibility.
Add -r/--round-robin option to allow division and extraction of
chunks in round robin fashion, in support of nonseekable files.
Add -t/--term option to allow user to choose delineation character;
supports parsing C escape sequences such as \n or \xdd.
* doc/coreutils.texi: update documentation of split.
* src/split.c: (eol): new global variable.
(usage, long_options, main): new options -n/--number, -r, and -t.
(bytes_split): add max_files argument. This allows for trivial
implementaton for byte chunking, similar to BSD.
(lines_split, line_bytes_split): delineate line by global eol char
instead of '\n'.
(lines_chunk_split): new function. Split file into eol delineated
chunks.
(bytes_chunk_extract): new function. Extract a chunk of file.
(lines_chunk_extract): new function. Extract a eol delineated chunk
of file.
(of_info): new struct. Used by new functions lines_rr and ofd_check
to keep track of file descriptors associated with output files.
(ofd_check): new function. Shuffle file descriptors in case output
files out number available file descriptors.
(lines_rr): new function. Split file into chunks in round-robin
fashion.
(lines_rr_extract): new function. Extract a chunk of file, as if
chunks were created in round-robin fashion.
(chunk_parse): new function. Parses /N and K/N syntax.
(eol_parse): new function. Parses -t option argument.
* tests/Makefile.am: add new tests.
* misc/split-bchunk: new test for byte delineated chunking.
* misc/split-fail: add failure scenarios for new options.
* misc/split-l: change typo ln --version to split --version.
* misc/split-lchunk: new test for line delineated chunking.
* misc/split-rchunk: new test for round-robin chunking.
* misc/split-t: new test for user defined eol char.
---
doc/coreutils.texi | 57 ++++-
src/split.c | 595 ++++++++++++++++++++++++++++++++++++++++++++++-
tests/Makefile.am | 4 +
tests/misc/split-bchunk | 46 ++++
tests/misc/split-fail | 8 +
tests/misc/split-l | 2 +-
tests/misc/split-lchunk | 56 +++++
tests/misc/split-rchunk | 56 +++++
tests/misc/split-t | 39 +++
9 files changed, 841 insertions(+), 22 deletions(-)
create mode 100755 tests/misc/split-bchunk
create mode 100755 tests/misc/split-lchunk
create mode 100755 tests/misc/split-rchunk
create mode 100755 tests/misc/split-t
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 444dbc7..ac022f4 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -104,7 +104,7 @@
* shuf: (coreutils)shuf invocation. Shuffling text files.
* sleep: (coreutils)sleep invocation. Delay for a specified time.
* sort: (coreutils)sort invocation. Sort text files.
-* split: (coreutils)split invocation. Split into fixed-size pieces.
+* split: (coreutils)split invocation. Split into pieces.
* stat: (coreutils)stat invocation. Report file(system) status.
* stdbuf: (coreutils)stdbuf invocation. Modify stdio buffering.
* stty: (coreutils)stty invocation. Print/change terminal settings.
@@ -2623,7 +2623,7 @@ These commands output pieces of the input.
@menu
* head invocation:: Output the first part of files.
* tail invocation:: Output the last part of files.
-* split invocation:: Split a file into fixed-size pieces.
+* split invocation:: Split a file into pieces.
* csplit invocation:: Split a file into context-determined pieces.
@end menu
@@ -2919,15 +2919,15 @@ mean either @samp{tail ./+4} or @samp{tail -n +4}.
@node split invocation
address@hidden @command{split}: Split a file into fixed-size pieces
address@hidden @command{split}: Split a file into pieces.
@pindex split
@cindex splitting a file into pieces
@cindex pieces, splitting a file into
address@hidden creates output files containing consecutive sections of
address@hidden (standard input if none is given or @var{input} is
address@hidden). Synopsis:
address@hidden creates output files containing consecutive or interleaved
+sections of @var{input} (standard input if none is given or @var{input}
+is @samp{-}). Synopsis:
@example
split address@hidden address@hidden address@hidden
@@ -2940,10 +2940,9 @@ left over for the last section), into each output file.
The output files' names consist of @var{prefix} (@samp{x} by default)
followed by a group of characters (@samp{aa}, @samp{ab}, @dots{} by
default), such that concatenating the output files in traditional
-sorted order by file name produces
-the original input file. If the output file names are exhausted,
address@hidden reports an error without deleting the output files
-that it did create.
+sorted order by file name produces the original input file (except
address@hidden). If the output file names are exhausted, @command{split}
+reports an error without deleting the output files that it did create.
The program accepts the following options. Also see @ref{Common options}.
@@ -2959,6 +2958,13 @@ For compatibility @command{split} also supports an
obsolete
option syntax @address@hidden New scripts should use @option{-l
@var{lines}} instead.
address@hidden -l address@hidden/@var{chunks}
address@hidden address@hidden/@var{chunks}
+If @var{k} is zero or omitted, divide @var{input} into @var{chunks}
+roughly equal-sized line delineated chunks.
+
+If @var{k} is present and nonzero, print @var{k}th of such chunks.
+
@item -b @var{size}
@itemx address@hidden
@opindex -b
@@ -2966,6 +2972,13 @@ option syntax @address@hidden New scripts should use
@option{-l
Put @var{size} bytes of @var{input} into each output file.
@multiplierSuffixes{size}
address@hidden -b address@hidden/@var{chunks}
address@hidden address@hidden/@var{chunks}
+If @var{k} is zero or omitted, divide @var{input} into @var{chunks}
+equal-sized chunks.
+
+If @var{k} is present and nonzero, print @var{k}th of such chunks.
+
@item -C @var{size}
@itemx address@hidden
@opindex -C
@@ -2975,6 +2988,30 @@ possible without exceeding @var{size} bytes. Individual
lines longer than
@var{size} bytes are broken into multiple files.
@var{size} has the same format as for the @option{--bytes} option.
address@hidden -n address@hidden/address@hidden
address@hidden --number address@hidden/address@hidden
address@hidden -n
address@hidden --number
+Same as @address@hidden/@var{chunks}}, for BSD compatibility.
+
address@hidden -r address@hidden/address@hidden
address@hidden --round-robin address@hidden/address@hidden
address@hidden -r
address@hidden --round-robin
+If @var{k} is zero or omitted, distribute @var{input} lines round-robin
+style into @var{chunks} output files.
+
+If @var{k} is present and nonzero, print @var{k}th of such chunks.
+
address@hidden -t @var{char}
address@hidden --term @var{char}
address@hidden -t
address@hidden --term
+Set @var{char} as the end of line character. Supports C escape sequences.
+Using this option with @option{-b @var{size}} is equivalent to
address@hidden @var{size}}, and with @option{-b address@hidden/@var{chunks}} is
+equivalent to @option{-l address@hidden/@var{chunks}}.
+
@item -a @var{length}
@itemx address@hidden
@opindex -a
diff --git a/src/split.c b/src/split.c
index 5bd9ebb..b1272c4 100644
--- a/src/split.c
+++ b/src/split.c
@@ -17,8 +17,7 @@
/* By address@hidden, with rms.
To do:
- * Implement -t CHAR or -t REGEX to specify break characters other
- than newline. */
+ * Extend -t CHAR to -t REGEX */
#include <config.h>
@@ -72,6 +71,9 @@ static int output_desc;
output file is opened. */
static bool verbose;
+/* End of line character */
+static char eol;
+
/* For long options that have no equivalent short option, use a
non-character as a pseudo short option, starting with CHAR_MAX + 1. */
enum
@@ -84,8 +86,11 @@ static struct option const longopts[] =
{"bytes", required_argument, NULL, 'b'},
{"lines", required_argument, NULL, 'l'},
{"line-bytes", required_argument, NULL, 'C'},
+ {"number", required_argument, NULL, 'n'},
+ {"round-robin", required_argument, NULL, 'r'},
{"suffix-length", required_argument, NULL, 'a'},
{"numeric-suffixes", no_argument, NULL, 'd'},
+ {"term", required_argument, NULL, 't'},
{"verbose", no_argument, NULL, VERBOSE_OPTION},
{GETOPT_HELP_OPTION_DECL},
{GETOPT_VERSION_OPTION_DECL},
@@ -116,9 +121,23 @@ Mandatory arguments to long options are mandatory for
short options too.\n\
fprintf (stdout, _("\
-a, --suffix-length=N use suffixes of length N (default %d)\n\
-b, --bytes=SIZE put SIZE bytes per output file\n\
+ -b, --bytes=/N generate N output files\n\
+ -b, --bytes=K/N print Kth of N chunks of file\n\
-C, --line-bytes=SIZE put at most SIZE bytes of lines per output file\n\
-d, --numeric-suffixes use numeric suffixes instead of alphabetic\n\
-l, --lines=NUMBER put NUMBER lines per output file\n\
+ -l, --lines=/N generate N eol delineated output files\n\
+ -l, --lines=K/N print Kth of N eol delineated chunks\n\
+ -n, --number=N same as --bytes=/N\n\
+ -n, --number=K/N same as --bytes=K/N\n\
+ -r, --round-robin=N generate N eol delineated output files using\n\
+ round-robin style distribution.\n\
+ -r. --round-robin=K/N print Kth of N eol delineated chunk as -rN would\n\
+ have generated.\n\
+ -t, --term=CHAR specify CHAR as eol. This will also convert\n\
+ -b to its line delineated equivalent (-C if\n\
+ splitting normally, -l if splitting by\n\
+ chunks). C escape sequences are accepted.\n\
"), DEFAULT_SUFFIX_LENGTH);
fputs (_("\
--verbose print a diagnostic just before each\n\
@@ -218,13 +237,14 @@ cwrite (bool new_file_flag, const char *bp, size_t bytes)
Use buffer BUF, whose size is BUFSIZE. */
static void
-bytes_split (uintmax_t n_bytes, char *buf, size_t bufsize)
+bytes_split (uintmax_t n_bytes, char *buf, size_t bufsize, uintmax_t max_files)
{
size_t n_read;
bool new_file_flag = true;
size_t to_read;
uintmax_t to_write = n_bytes;
char *bp_out;
+ uintmax_t opened = 1;
do
{
@@ -251,7 +271,8 @@ bytes_split (uintmax_t n_bytes, char *buf, size_t bufsize)
cwrite (new_file_flag, bp_out, w);
bp_out += w;
to_read -= w;
- new_file_flag = true;
+ new_file_flag = (opened++ < max_files || !max_files)?
+ true : false;
to_write = n_bytes;
}
}
@@ -277,10 +298,10 @@ lines_split (uintmax_t n_lines, char *buf, size_t bufsize)
error (EXIT_FAILURE, errno, "%s", infile);
bp = bp_out = buf;
eob = bp + n_read;
- *eob = '\n';
+ *eob = eol;
for (;;)
{
- bp = memchr (bp, '\n', eob - bp + 1);
+ bp = memchr (bp, eol, eob - bp + 1);
if (bp == eob)
{
if (eob != bp_out) /* do not write 0 bytes! */
@@ -340,7 +361,7 @@ line_bytes_split (size_t n_bytes)
bp = buf + n_buffered;
if (n_buffered == n_bytes)
{
- while (bp > buf && bp[-1] != '\n')
+ while (bp > buf && bp[-1] != eol)
bp--;
}
@@ -362,6 +383,328 @@ line_bytes_split (size_t n_bytes)
free (buf);
}
+/* Split into NUMBER eol chunks. */
+
+static void
+lines_chunk_split (size_t number, char *buf, size_t bufsize, size_t file_size)
+{
+ size_t n_read;
+ size_t chunk_no = 1;
+ off_t chunk_end = file_size / number - 1;
+ off_t offset = 0;
+ bool new_file_flag = true;
+ char *bp, *bp_out, *eob;
+
+ while (offset < file_size)
+ {
+ n_read = full_read (STDIN_FILENO, buf, bufsize);
+ if (n_read == SAFE_READ_ERROR)
+ error (EXIT_FAILURE, errno, "%s", infile);
+ bp = buf;
+ eob = buf + n_read;
+
+ while (1)
+ {
+ /* Begin lookng for eol at last byte of chunk. */
+ bp_out = (offset < chunk_end)? bp + chunk_end - offset : bp;
+ if (bp_out > eob)
+ bp_out = eob;
+ bp_out = memchr (bp_out, eol, eob - bp_out);
+ if (!bp_out)
+ {
+ /* Buffer exhausted. */
+ cwrite (new_file_flag, bp, eob - bp);
+ new_file_flag = false;
+ offset += eob - bp;
+ break;
+ }
+ else
+ bp_out++;
+
+ cwrite (new_file_flag, bp, bp_out - bp);
+ chunk_end = (++chunk_no < number)?
+ chunk_end + file_size / number : file_size;
+ new_file_flag = true;
+ offset += bp_out - bp;
+ bp = bp_out;
+ /* A line could have been so long that it skipped
+ entire chunks. */
+ while (chunk_end < offset)
+ {
+ chunk_end += file_size / number;
+ chunk_no++;
+ /* Create blank file: this ensures NUMBER files are
+ created. */
+ cwrite (true, bp, 0);
+ }
+ }
+ }
+}
+
+/* Extract Nth of TOTAL chunks. */
+
+static void
+bytes_chunk_extract (size_t n, size_t total, char *buf, size_t bufsize,
+ size_t file_size)
+{
+ off_t start = (n == 0)? 0 : (n - 1) * (file_size / total);
+ off_t end = (n == total)? file_size : n * (file_size / total);
+ ssize_t n_read;
+ size_t n_write;
+
+ while (1)
+ {
+ n_read = pread (STDIN_FILENO, buf, bufsize, start);
+ if (n_read < 0)
+ error (EXIT_FAILURE, errno, "%s", infile);
+ n_write = (start + n_read <= end)? n_read : end - start;
+ if (full_write (STDOUT_FILENO, buf, n_write) != n_write)
+ error (EXIT_FAILURE, errno, "output error");
+ start += n_read;
+ if (end <= start)
+ return;
+ }
+}
+
+/* Extract lines whose first byte is in the Nth of TOTAL chunks. */
+
+static void
+lines_chunk_extract (size_t n, size_t total, char* buf, size_t bufsize,
+ size_t file_size)
+{
+ ssize_t n_read;
+ bool end_of_chunk = false;
+ bool skip = true;
+ char *bp = buf, *bp_out = buf, *eob;
+ off_t start;
+ off_t end;
+
+ /* For n != 1, start reading 1 byte before nth chunk of file. This is to
+ detect if the first byte of chunk is the first byte of a line. */
+ if (n == 1)
+ {
+ start = 0;
+ skip = false;
+ }
+ else
+ start = (n - 1) * (file_size / total) - 1;
+ end = (n == total)? file_size - 1 : n * (file_size / total) - 1;
+
+ do
+ {
+ n_read = pread (STDIN_FILENO, buf, bufsize, start);
+ if (n_read < 0)
+ error (EXIT_FAILURE, errno, "%s", infile);
+ bp = buf;
+ bp_out = buf + n_read;
+ eob = bp_out;
+
+ /* Find starting point. */
+ if (skip)
+ {
+ bp = memchr (buf, eol, n_read);
+ if (bp && bp - buf < end - start)
+ {
+ bp++;
+ skip = false;
+ }
+ else if (!bp && start + n_read < end)
+ {
+ start += n_read;
+ continue;
+ }
+ else
+ return;
+ }
+
+ /* Find ending point. */
+ if (end < start + n_read && end == file_size - 1)
+ end_of_chunk = true;
+ else if (start + n_read >= end)
+ {
+ bp_out = (buf + end - start < buf)? buf : buf + end - start;
+ bp_out = memchr (bp_out, eol, eob - bp_out);
+ if (bp_out)
+ {
+ bp_out++;
+ end_of_chunk = true;
+ }
+ else
+ bp_out = eob;
+ }
+
+ if (write (STDOUT_FILENO, bp, bp_out - bp) != bp_out - bp)
+ error (EXIT_FAILURE, errno, "output error");
+ start += n_read;
+ }
+ while (!end_of_chunk);
+}
+
+
+
+typedef struct of_info
+{
+ char *of_name;
+ int ofd;
+} of_t;
+
+/* Rotates file descriptors when we're writing to more output files than we
+ have available file descriptors. */
+
+static void
+ofd_check (of_t *ofiles, size_t i, size_t n)
+{
+ if (0 < ofiles[i].ofd)
+ return;
+ else
+ {
+ int fd;
+ int j = i - 1;
+
+ /* Another process could have opened a file in between the calls to
+ close and open, so we should keep trying until open succeeds or
+ we've closed all of our files. */
+ while (1)
+ {
+ /* Attempt to open file. */
+ fd = open (ofiles[i].of_name,
+ O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
+ (S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP
+ | S_IROTH | S_IWOTH));
+ if (-1 < fd)
+ break;
+ /* Find an open file to close. */
+ while (ofiles[j].ofd < 0)
+ {
+ if (--j == 0)
+ j = n - 1;
+ /* No more open files to close, exit with failure. */
+ if (j == i)
+ error (EXIT_FAILURE, 0, "%s", ofiles[i].of_name);
+ }
+ close (ofiles[j].ofd);
+ }
+ ofiles[i].ofd = fd;
+ }
+}
+
+/* Divide file into N chunks in round robin fashion. */
+
+static void
+lines_rr (size_t n, char *buf, size_t bufsize)
+{
+ of_t *ofiles = xnmalloc (n, sizeof *ofiles);
+ char *bp, *bp_out, *eob;
+ size_t n_read;
+ bool eof = false;
+ size_t i;
+ bool inc;
+
+ /* Generate output file names. */
+ for (i = 0; i < n; i++)
+ {
+ next_file_name ();
+ ofiles[i].of_name = xmalloc (strlen (outfile) + 1);
+ strcpy (ofiles[i].of_name, outfile);
+ ofiles[i].ofd = -1;
+ }
+ i = 0;
+
+ do
+ {
+ n_read = full_read (STDIN_FILENO, buf, bufsize);
+ if (n_read == SAFE_READ_ERROR)
+ error (EXIT_FAILURE, errno, "%s", infile);
+ if (n_read < bufsize)
+ {
+ if (n_read == 0)
+ break;
+ eof = true;
+ }
+ bp = buf;
+ eob = buf + n_read;
+
+
+ while (bp != eob)
+ {
+ /* Find end of line. */
+ bp_out = memchr (bp, eol, eob - bp);
+ if (bp_out)
+ {
+ bp_out++;
+ inc = true;
+ }
+ else
+ bp_out = eob;
+
+ /* Secure file descriptor. */
+ ofd_check (ofiles, i, n);
+
+ if (full_write (ofiles[i].ofd, bp, bp_out - bp) != bp_out - bp)
+ error (EXIT_FAILURE, errno, "%s", ofiles[i].of_name);
+ if (inc && ++i == n)
+ i = 0;
+ bp = bp_out;
+ inc = false;
+ }
+ }
+ while (!eof);
+
+ /* Close any open file descriptors. */
+ for (i = 0; i < n; i++)
+ if (-1 < ofiles[i].ofd)
+ close (ofiles[i].ofd);
+}
+
+/* Extract Nth of TOT eol delineated, round robin distributed chunks. */
+
+static void
+lines_rr_extract (uintmax_t n, uintmax_t tot, char *buf, size_t bufsize)
+{
+ int line_no = 1;
+ char *bp, *bp_out, *eob;
+ size_t n_read;
+ bool eof = false;
+ bool inc = false;
+
+ do
+ {
+ n_read = full_read (STDIN_FILENO, buf, bufsize);
+ if (n_read == SAFE_READ_ERROR)
+ error (EXIT_FAILURE, errno, "%s", infile);
+ if (n_read != bufsize)
+ {
+ if (n_read == 0)
+ break;
+ eof = true;
+ }
+ bp = buf;
+ eob = buf + n_read;
+
+ while (bp != eob)
+ {
+ /* Find end of line. */
+ bp_out = memchr (bp, eol, eob - bp);
+ if (bp_out)
+ {
+ bp_out++;
+ inc = true;
+ }
+ else
+ bp_out = eob;
+
+ if (line_no == n
+ && full_write (STDOUT_FILENO, bp, bp_out - bp) != bp_out - bp)
+ error (EXIT_FAILURE, errno, "output error");
+ if (inc)
+ line_no = (line_no == tot)? 1 : line_no + 1;
+ bp = bp_out;
+ inc = false;
+ }
+ }
+ while (!eof);
+}
+
#define FAIL_ONLY_ONE_WAY() \
do \
{ \
@@ -370,21 +713,159 @@ line_bytes_split (size_t n_bytes)
} \
while (0)
+/* Parse K/N syntax of chunk options. */
+
+static void
+chunk_parse (uintmax_t *m_units, uintmax_t *n_units, char *slash)
+{
+ *slash = '\0';
+ if (slash != optarg
+ && xstrtoumax (optarg, NULL, 10, m_units, "") != LONGINT_OK
+ || SIZE_MAX < *m_units)
+ {
+ error (0, 0, _("%s: invalid chunk number"), optarg);
+ usage (EXIT_FAILURE);
+ }
+ if (xstrtoumax (++slash, NULL, 10, n_units, "") != LONGINT_OK
+ || *n_units == 0 || *n_units < *m_units || SIZE_MAX < *n_units)
+ {
+ error (0, 0, _("%s: invalid number of total chunks"), slash);
+ usage (EXIT_FAILURE);
+ }
+}
+
+/* Parse eol character for -t option. */
+
+static void
+eol_parse ()
+{
+ if (*optarg == '\\')
+ switch (*(optarg+1))
+ {
+ case 'a':
+ if (*(optarg + 2) != 0)
+ error (EXIT_FAILURE, 0, _("%s: invalid escape sequence"), optarg);
+ eol = '\a';
+ break;
+
+ case 'b':
+ if (*(optarg + 2) != 0)
+ error (EXIT_FAILURE, 0, _("%s: invalid escape sequence"), optarg);
+ eol = '\b';
+ break;
+
+ case 'f':
+ if (*(optarg + 2) != 0)
+ error (EXIT_FAILURE, 0, _("%s: invalid escape sequence"), optarg);
+ eol = '\f';
+ break;
+
+ case 'n':
+ if (*(optarg + 2) != 0)
+ error (EXIT_FAILURE, 0, _("%s: invalid escape sequence"), optarg);
+ eol = '\n';
+ break;
+
+ case 'r':
+ if (*(optarg + 2) != 0)
+ error (EXIT_FAILURE, 0, _("%s: invalid escape sequence"), optarg);
+ eol = '\r';
+ break;
+
+ case 't':
+ if (*(optarg + 2) != 0)
+ error (EXIT_FAILURE, 0, _("%s: invalid escape sequence"), optarg);
+ eol = '\t';
+ break;
+
+ case 'v':
+ if (*(optarg + 2) != 0)
+ error (EXIT_FAILURE, 0, _("%s: invalid escape sequence"), optarg);
+ eol = '\v';
+ break;
+
+ case '\'':
+ if (*(optarg + 2) != 0)
+ error (EXIT_FAILURE, 0, _("%s: invalid escape sequence"), optarg);
+ eol = '\'';
+ break;
+
+ case '\"':
+ if (*(optarg + 2) != 0)
+ error (EXIT_FAILURE, 0, _("%s: invalid escape sequence"), optarg);
+ eol = '\"';
+ break;
+
+ case '\\':
+ if (*(optarg + 2) != 0)
+ error (EXIT_FAILURE, 0, _("%s: invalid escape sequence"), optarg);
+ eol = '\\';
+ break;
+
+ case '0':
+ case '1':
+ case '2':
+ case '3':
+ case '4':
+ case '5':
+ case '6':
+ case '7':
+ {
+ char *term;
+ long int tmp;
+ if (xstrtol (optarg + 1, &term, 8, &tmp, "") != LONGINT_OK
+ || tmp < 0 || 255 < tmp ||4 + optarg < term || *term != 0)
+ error (EXIT_FAILURE, 0, _("%s: invalid octal esacpe sequence"),
+ optarg);
+ eol = (char) tmp;
+ break;
+ }
+
+ case 'x':
+ {
+ char *term;
+ long int tmp;
+ if (xstrtol (optarg + 2, &term, 16, &tmp, "") != LONGINT_OK
+ || tmp < 0 || 255 < tmp || 4 + optarg < term || *term != 0)
+ error (EXIT_FAILURE, 0, _("%s: invalid hex escape sequence"),
+ optarg);
+ eol = (char) tmp;
+ break;
+ }
+
+ default:
+ error (0, 0, _("%s: invalid escape sequence"), optarg);
+ usage (EXIT_FAILURE);
+ }
+ else
+ {
+ if (*(optarg + 1) != 0)
+ error (EXIT_FAILURE, 0, _("%s: invalid eol character"), optarg);
+ eol = *optarg;
+ }
+}
+
+
int
main (int argc, char **argv)
{
struct stat stat_buf;
enum
{
- type_undef, type_bytes, type_byteslines, type_lines, type_digits
+ type_undef, type_bytes, type_byteslines, type_lines, type_digits,
+ type_chunk_bytes, type_chunk_eol, type_rr
} split_type = type_undef;
size_t in_blk_size; /* optimal block size of input file device */
char *buf; /* file i/o buffer */
size_t page_size = getpagesize ();
+ uintmax_t m_units = 0;
uintmax_t n_units;
static char const multipliers[] = "bEGKkMmPTYZ0";
int c;
int digits_optind = 0;
+ size_t file_size;
+ char *slash;
+ bool eol_char = false;
initialize_main (&argc, &argv);
set_program_name (argv[0]);
@@ -404,7 +885,7 @@ main (int argc, char **argv)
/* This is the argv-index of the option we will read next. */
int this_optind = optind ? optind : 1;
- c = getopt_long (argc, argv, "0123456789C:a:b:dl:", longopts, NULL);
+ c = getopt_long (argc, argv, "0123456789C:a:b:c:dl:n:r:t:", longopts,
NULL);
if (c == -1)
break;
@@ -426,6 +907,13 @@ main (int argc, char **argv)
case 'b':
if (split_type != type_undef)
FAIL_ONLY_ONE_WAY ();
+ slash = strchr (optarg, '/');
+ if (slash)
+ {
+ split_type = type_chunk_bytes;
+ chunk_parse (&m_units, &n_units, slash);
+ break;
+ }
split_type = type_bytes;
if (xstrtoumax (optarg, NULL, 10, &n_units, multipliers) !=
LONGINT_OK
|| n_units == 0)
@@ -438,6 +926,13 @@ main (int argc, char **argv)
case 'l':
if (split_type != type_undef)
FAIL_ONLY_ONE_WAY ();
+ slash = strchr (optarg, '/');
+ if (slash)
+ {
+ split_type = type_chunk_eol;
+ chunk_parse (&m_units, &n_units, slash);
+ break;
+ }
split_type = type_lines;
if (xstrtoumax (optarg, NULL, 10, &n_units, "") != LONGINT_OK
|| n_units == 0)
@@ -459,6 +954,42 @@ main (int argc, char **argv)
}
break;
+ case 'n':
+ if (split_type != type_undef)
+ FAIL_ONLY_ONE_WAY ();
+ split_type = type_chunk_bytes;
+ slash = strchr (optarg, '/');
+ if (slash)
+ {
+ chunk_parse (&m_units, &n_units, slash);
+ break;
+ }
+ if (xstrtoumax (optarg, NULL, 10, &n_units, "") != LONGINT_OK
+ || n_units == 0 || SIZE_MAX < n_units)
+ {
+ error (0, 0, _("%s: invalid number of chunks"), optarg);
+ usage (EXIT_FAILURE);
+ }
+ break;
+
+ case 'r':
+ if (split_type != type_undef)
+ FAIL_ONLY_ONE_WAY ();
+ split_type = type_rr;
+ slash = strchr (optarg, '/');
+ if (slash)
+ {
+ chunk_parse (&m_units, &n_units, slash);
+ break;
+ }
+ if (xstrtoumax (optarg, NULL, 10, &n_units, "") != LONGINT_OK
+ || n_units == 0 || SIZE_MAX < n_units)
+ {
+ error (0, 0, _("%s: invalid number of chunks"), optarg);
+ usage (EXIT_FAILURE);
+ }
+ break;
+
case '0':
case '1':
case '2':
@@ -492,6 +1023,11 @@ main (int argc, char **argv)
suffix_alphabet = "0123456789";
break;
+ case 't':
+ eol_parse ();
+ eol_char = true;
+ break;
+
case VERBOSE_OPTION:
verbose = true;
break;
@@ -505,6 +1041,17 @@ main (int argc, char **argv)
}
}
+ /* Default eol to \n if none specified. */
+ if (!eol_char)
+ eol = '\n';
+ else
+ {
+ if (split_type == type_chunk_bytes)
+ split_type = type_chunk_eol;
+ if (split_type == type_bytes)
+ split_type = type_byteslines;
+ }
+
/* Handle default case. */
if (split_type == type_undef)
{
@@ -546,10 +1093,15 @@ main (int argc, char **argv)
output_desc = -1;
/* Get the optimal block size of input device and make a buffer. */
-
if (fstat (STDIN_FILENO, &stat_buf) != 0)
error (EXIT_FAILURE, errno, "%s", infile);
in_blk_size = io_blksize (stat_buf);
+ file_size = stat_buf.st_size;
+
+ if (split_type == type_chunk_bytes || split_type == type_chunk_eol
+ || split_type == type_rr)
+ if (file_size < n_units)
+ error (EXIT_FAILURE, errno, "number of chunks exceed file size");
buf = ptr_align (xmalloc (in_blk_size + 1 + page_size - 1), page_size);
@@ -561,13 +1113,34 @@ main (int argc, char **argv)
break;
case type_bytes:
- bytes_split (n_units, buf, in_blk_size);
+ bytes_split (n_units, buf, in_blk_size, 0);
break;
case type_byteslines:
line_bytes_split (n_units);
break;
+ case type_chunk_bytes:
+ if (m_units == 0)
+ bytes_split (file_size / n_units, buf, in_blk_size, n_units);
+ else
+ bytes_chunk_extract (m_units, n_units, buf, in_blk_size, file_size);
+ break;
+
+ case type_chunk_eol:
+ if (m_units == 0)
+ lines_chunk_split (n_units, buf, in_blk_size, file_size);
+ else
+ lines_chunk_extract (m_units, n_units, buf, in_blk_size, file_size);
+ break;
+
+ case type_rr:
+ if (m_units == 0)
+ lines_rr (n_units, buf, in_blk_size);
+ else
+ lines_rr_extract (m_units, n_units, buf, in_blk_size);
+ break;
+
default:
abort ();
}
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 85503cc..89d2e40 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -228,8 +228,12 @@ TESTS = \
misc/sort-rand \
misc/sort-version \
misc/split-a \
+ misc/split-bchunk \
misc/split-fail \
misc/split-l \
+ misc/split-lchunk \
+ misc/split-rchunk \
+ misc/split-t \
misc/stat-fmt \
misc/stat-hyphen \
misc/stat-printf \
diff --git a/tests/misc/split-bchunk b/tests/misc/split-bchunk
new file mode 100755
index 0000000..15c0d64
--- /dev/null
+++ b/tests/misc/split-bchunk
@@ -0,0 +1,46 @@
+#!/bin/sh
+# show that splitting into 3 byte delineated chunks works.
+
+# Copyright (C) 2009 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+if test "$VERBOSE" = yes; then
+ set -x
+ split --version
+fi
+. $srcdir/test-lib.sh
+
+printf '1\n2\n3\n4\n5\n' > in || framework_failure
+
+split --bytes=/3 in > out || fail=1
+split --bytes=1/3 in > b1 || fail=1
+split --bytes=2/3 in > b2 || fail=1
+split --bytes=3/3 in > b3 || fail=1
+echo -n -e 1'\n'2 > exp-1
+echo -e '\n'3 > exp-2
+echo -e 4'\n'5 > exp-3
+
+compare xaa exp-1 || fail=1
+compare xab exp-2 || fail=1
+compare xac exp-3 || fail=1
+compare b1 exp-1 || fail=1
+compare b2 exp-2 || fail=1
+compare b3 exp-3 || fail=1
+test -f xad && fail=1
+
+# Splitting into more chunks than file size should fail.
+split --bytes=/20 in 2> /dev/null && fail=1
+
+Exit $fail
diff --git a/tests/misc/split-fail b/tests/misc/split-fail
index e36c86d..4a0c9c3 100755
--- a/tests/misc/split-fail
+++ b/tests/misc/split-fail
@@ -29,8 +29,11 @@ touch in || framework_failure
split -a 0 in 2> /dev/null || fail=1
split -b 0 in 2> /dev/null && fail=1
+split -b /0 in 2> /dev/null && fail=1
split -C 0 in 2> /dev/null && fail=1
split -l 0 in 2> /dev/null && fail=1
+split -l /0 in 2> /dev/null && fail=1
+split -t in 2> /dev/null && fail=1
# Make sure -C doesn't create empty files.
rm -f x?? || fail=1
@@ -64,5 +67,10 @@ split: line count option -99*... is too large
EOF
compare out exp || fail=1
+# Make sure invalid -t characters are not accepted.
+split -tab in 2> /dev/null && fail=1;
+split -t\\nb in 2> /dev/null && fail=1;
+split -t\\8 in 2> /dev/null && fail=1;
+split -t\\x1FF 2> /dev/null && fail=1;
Exit $fail
diff --git a/tests/misc/split-l b/tests/misc/split-l
index fb07a27..850d5b5 100755
--- a/tests/misc/split-l
+++ b/tests/misc/split-l
@@ -18,7 +18,7 @@
if test "$VERBOSE" = yes; then
set -x
- ln --version
+ split --version
fi
. $srcdir/test-lib.sh
diff --git a/tests/misc/split-lchunk b/tests/misc/split-lchunk
new file mode 100755
index 0000000..cb71939
--- /dev/null
+++ b/tests/misc/split-lchunk
@@ -0,0 +1,56 @@
+#!/bin/sh
+# show that splitting into 3 newline delineated chunks works.
+
+# Copyright (C) 2009 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+if test "$VERBOSE" = yes; then
+ set -x
+ ln --version
+fi
+
+. $srcdir/test-lib.sh
+
+printf '1\n2\n3\n4\n5\n' > in || framework_failure
+
+split --lines=/3 in > out || fail=1
+split --lines=1/3 in > l1 || fail=1
+split --lines=2/3 in > l2 || fail=1
+split --lines=3/3 in > l3 || fail=1
+
+cat <<\EOF > exp-1
+1
+2
+EOF
+cat <<\EOF > exp-2
+3
+EOF
+cat <<\EOF > exp-3
+4
+5
+EOF
+
+compare xaa exp-1 || fail=1
+compare xab exp-2 || fail=1
+compare xac exp-3 || fail=1
+compare l1 exp-1 || fail=1
+compare l2 exp-2 || fail=1
+compare l3 exp-3 || fail=1
+test -f xad && fail=1
+
+# Splitting into more chunks than file size should fail.
+split --bytes=/20 in 2> /dev/null && fail=1
+
+Exit $fail
diff --git a/tests/misc/split-rchunk b/tests/misc/split-rchunk
new file mode 100755
index 0000000..080e6a2
--- /dev/null
+++ b/tests/misc/split-rchunk
@@ -0,0 +1,56 @@
+#!/bin/sh
+# show that splitting into 3 round-robin chunks works.
+
+# Copyright (C) 2009 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+if test "$VERBOSE" = yes; then
+ set -x
+ ln --version
+fi
+
+. $srcdir/test-lib.sh
+
+printf '1\n2\n3\n4\n5\n' > in || framework_failure
+
+split --round-robin=/3 in > out || fail=1
+split --round-robin=1/3 in > r1 || fail=1
+split --round-robin=2/3 in > r2 || fail=1
+split --round-robin=3/3 in > r3 || fail=1
+
+cat <<\EOF > exp-1
+1
+4
+EOF
+cat <<\EOF > exp-2
+2
+5
+EOF
+cat <<\EOF > exp-3
+3
+EOF
+
+compare xaa exp-1 || fail=1
+compare xab exp-2 || fail=1
+compare xac exp-3 || fail=1
+compare r1 exp-1 || fail=1
+compare r2 exp-2 || fail=1
+compare r3 exp-3 || fail=1
+test -f xad && fail=1
+
+# Splitting into more chunks than file size should fail.
+split --bytes=/20 in 2> /dev/null && fail=1
+
+Exit $fail
diff --git a/tests/misc/split-t b/tests/misc/split-t
new file mode 100755
index 0000000..4fba0f2
--- /dev/null
+++ b/tests/misc/split-t
@@ -0,0 +1,39 @@
+#!/bin/sh
+# show that splitting with '\0' as the eol char works.
+
+# Copyright (C) 2009 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+if test "$VERBOSE" = yes; then
+ set -x
+ split --version
+fi
+
+. $srcdir/test-lib.sh
+
+echo -n -e a'\0'b'\0'c'\0'd'\0'e'\0' > in || framework_failure
+
+split -l 2 -t \\0 in > out || fail=1
+
+echo -n -e a'\0'b'\0' > exp-1
+echo -n -e c'\0'd'\0' > exp-2
+echo -n -e e'\0' > exp-3
+
+compare xaa exp-1 || fail=1
+compare xab exp-2 || fail=1
+compare xac exp-3 || fail=1
+test -f xad && fail=1
+
+Exit $fail
--
1.6.3.3
- Re: [PATCH] split: --chunks option,
Chen Guo <=