bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#46048: split -n K/N loses data, sum of output files is smaller than


From: Pádraig Brady
Subject: bug#46048: split -n K/N loses data, sum of output files is smaller than input file.
Date: Sun, 24 Jan 2021 16:52:57 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:84.0) Gecko/20100101 Thunderbird/84.0

On 23/01/2021 04:58, Paul Hirst wrote:
split --number K/N appears to lose data in, with the sum of the sizes of
the output files being smaller than the original input file by 131072 bytes.

$ split --version
split (GNU coreutils) 8.30
...

$ head -c 1000000 < /dev/urandom > test.dat
$ split --number=1/4 test.dat > t1
$ split --number=2/4 test.dat > t2
$ split --number=3/4 test.dat > t3
$ split --number=4/4 test.dat > t4

$ ls -l
-rw-r--r-- 1 user user  250000 Jan 22 18:36 t1
-rw-r--r-- 1 user user  250000 Jan 22 18:36 t2
-rw-r--r-- 1 user user  250000 Jan 22 18:36 t3
-rw-r--r-- 1 user user  118928 Jan 22 18:36 t4
-rw-r--r-- 1 user user 1000000 Jan 22 18:33 test.dat

Surely this should not be the case?

Ugh. This functionality was broken for all files > 128KiB
due to adjustments for handling /dev/zero

$ truncate -s 1000000 test.dat
$ split --number=4/4 test.dat | wc -c
118928

The following patch fixes it here.
I need to do some more testing, before committing.

thanks!

diff --git a/src/split.c b/src/split.c
index 0660da13f..6aa8d50e9 100644
--- a/src/split.c
+++ b/src/split.c
@@ -1001,7 +1001,7 @@ bytes_chunk_extract (uintmax_t k, uintmax_t n, char *buf, 
size_t bufsize,
     }
   else
     {
-      if (lseek (STDIN_FILENO, start, SEEK_CUR) < 0)
+      if (lseek (STDIN_FILENO, start, SEEK_SET) < 0)
         die (EXIT_FAILURE, errno, "%s", quotef (infile));
       initial_read = SIZE_MAX;
     }





reply via email to

[Prev in Thread] Current Thread [Next in Thread]