[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: sparse file processing improvements
From: |
Pádraig Brady |
Subject: |
Re: sparse file processing improvements |
Date: |
Wed, 08 Oct 2014 00:08:28 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 |
On 10/07/2014 07:56 PM, Pádraig Brady wrote:
> On 10/07/2014 02:50 AM, Bernhard Voelker wrote:
>> On 10/06/2014 04:40 PM, Pádraig Brady wrote:
>>> I've attached 3 patches to improve various sparse file handling.
>>> 1. supports detecting holes < internal buffer size (currently 128KiB)
>>> 2. employs the punch hole scheme demonstrated above
>>> 3. reads sparse files efficiently even if not writing to a regular file
>>>
>>> thanks,
>>> Pádraig.
>>
>> Hi Padraig,
>>
>> cool work, thanks!
>>
>> At a first glance, the changes in copy.c seem to be correct.
>> Without having a deeper look yet, the new inner loop to detect
>> holes in the just-read buffer seems to do the lseek() twice if
>> a hole would span adjacent, outer-loop read()s.
>> The ls(1) output of the result in the new test seems to second
>> my guess:
>>
>> + ls -lsh sparse.in sparse.out sparse.out2
>> 512K -rw-r--r-- 1 voelkerb users 512K Oct 6 20:03 sparse.in
>> 260K -rw-r--r-- 1 voelkerb users 512K Oct 6 20:03 sparse.out
>> 256K -rw-r--r-- 1 voelkerb users 512K Oct 6 20:03 sparse.out2
>>
>> This means the copying could be even more effective in the first
>> run, right?
>
> Yes the patch would do at least 1 lseek per outer read loop.
> The extra 4K above I think is just a side effect on ext4 of the trailing
> lseek,
> rather than just leaving it to truncate, or maybe just alignment slop.
> But you're right, both could be improved by hoisting the hole control
> vars above the read loop and only doing 1 lseek for each hole,
> and only a truncate for the last hole. The fiemap based copy
> also avoids the last lseek, so this would be more consistent as well.
>
> The incremental diff is simple enough:
>
> diff --git a/src/copy.c b/src/copy.c
> index 12af6db..d32a131 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -165,6 +165,8 @@ sparse_copy (int src_fd, int dest_fd, char *buf, size_t
> buf_size,
> {
> *last_write_made_hole = false;
> *total_n_read = 0;
> + bool make_hole = false;
> + off_t psize = 0;
>
> while (max_n_read)
> {
> @@ -182,10 +184,8 @@ sparse_copy (int src_fd, int dest_fd, char *buf, size_t
> buf_size,
> *total_n_read += n_read;
>
> /* Loop over the input buffer in chunks of hole_size. */
> - bool make_hole = false;
> size_t csize = make_holes ? hole_size : buf_size;
> char *cbuf = buf;
> - size_t psize = 0;
> char *pbuf = buf;
>
> while (n_read)
> @@ -207,7 +207,7 @@ sparse_copy (int src_fd, int dest_fd, char *buf, size_t
> buf_size,
> }
>
> bool transition = (make_hole != prev_hole) && psize;
> - bool last_chunk = (n_read == csize) || ! csize;
> + bool last_chunk = (n_read == csize && ! make_hole) || ! csize;
>
> if (transition || last_chunk)
> {
> @@ -231,16 +231,29 @@ sparse_copy (int src_fd, int dest_fd, char *buf, size_t
> buf_size,
> }
> }
>
> - pbuf += psize;
> + pbuf = cbuf;
> psize = csize;
>
> - if (transition && last_chunk)
> - csize = 0;
> + if (last_chunk)
> + {
> + if (transition)
> + csize = 0; /* Loop again to deal with last chunk. */
> + else
> + psize = 0; /* Reset for next read loop. */
> + }
> else if (! csize)
> - n_read = 0;
> + n_read = 0; /* Finished processing buffer. */
> }
> else /* Coalesce writes/seeks. */
> - psize += csize;
> + {
> + if (psize <= OFF_T_MAX - csize)
> + psize += csize;
> + else
> + {
> + error (0, 0, _("overflow reading %s"), quote (src_name));
> + return false;
> + }
> + }
>
> n_read -= csize;
> cbuf += csize;
>
>>> diff --git a/NEWS b/NEWS
>>> index 1811ae4..785773f 100644
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -30,6 +30,9 @@ GNU coreutils NEWS -*-
>>> outline -*-
>>>
>>> ** Improvements
>>>
>>> + cp will convert smaller runs of NULs in the input to holes,
>>> + to reduce allocation in the copy.
>>> +
>>
>> Aren't ginstall(1) and mv(1) also somehow affected? If so, then
>> the NEWS should mention this.
>
> done
>
>>> diff --git a/tests/cp/sparse.sh b/tests/cp/sparse.sh
>>> index d6cc4c4..1414d35 100755
>>> --- a/tests/cp/sparse.sh
>>> +++ b/tests/cp/sparse.sh
>>> @@ -37,4 +37,32 @@ test $(stat --printf %b copy) -le $(stat --printf %b
>>> sparse) || fail=1
>>> cp --sparse=always --reflink sparse copy && fail=1
>>> cp --sparse=never --reflink sparse copy && fail=1
>>>
>>> +
>>> +# Ensure we handle sparse/non-sparse transitions correctly
>>> +maxn=128 # how many $hole_size chunks per file
>>> +hole_size=$(stat -c %o copy)
>>> +dd if=/dev/zero bs=$hole_size count=$maxn of=zeros
>>> +tr '\0' '\1' < zeros > ones
>>
>> Test framework failures during the above commands (and below)
>> should be caught.
>
> done
>
>>> +for n in 1 2 3 4 32 $maxn; do
>>> + parts=$(expr $maxn / $n)
>>> +
>>> + rm -f sparse.in
>>> +
>>> + # Generate sparse file for copying with alternating
>>> + # hole/data patterns of size n * $hole_size
>>> + for i in $(yes zeros | sed 1~2s/zeros/ones/ | head -n$parts); do
>>> + dd iflag=fullblock if=$i of=sparse.in conv=notrunc oflag=append \
>>> + bs=$hole_size count=$n status=none || framework_failure_
>>> + done
>>
>> a)
>> Looking at the ls(1) output above, the generated sparse.in files are
>> not sparse, are they? I think it doesn't matter, and in fact the file
>> to copy first does not have to be sparse. This is what the second
>> copying below is for. So just s/sparse/sample/ in the comment maybe.
>
> Right the first file should not be sparse.
> Otherwise we get only the fiemap code for handling holes.
> Comment fixed.
>
>> b)
>> Instead of appending to the sample file with "of=sparse.in conv=notrunc"
>> " oflag=append", you could just redirect the for-loop's output to that
>> file.
>
> maybe, though more data copying in that case
>
>> c)
>> I'm not sure if the sed command "1~2s" is available everywhere
>> (TBH I've seen it the first time myself).
>> What about avoiding the sed(1) call completely by moving the
>> alternating logic into the yes(1) call?
>>
>> s="$(printf "%s\n%s" zeros ones)"
>> for i in $(yes "$s" | head -n$parts); do
>> ...
>
> Good point I need to make that portable.
>
>>> +
>>> + cp --sparse=always sparse.in sparse.out || fail=1 # non sparse input
>>> + cp --sparse=always sparse.out sparse.out2 || fail=1 # sparse input
>>> +
>>> + cmp sparse.in sparse.out || fail=1
>>> + cmp sparse.in sparse.out2 || fail=1
>>> +
>>> + ls -lsh sparse.*
>>
>> Given the sparse-detection and -punching is optimal, then both
>> output file should always have the same size, shouldn't they?
>> Therefore, the test should check those numbers.
>
> I'm very wary of checking the numbers as they could vary
> due to alignment, or async reclamation on the file system.
>
>>> +done
>>> +
>>> Exit $fail
>>
>> Although the test data seems to be quite good, I somehow
>> have the feeling that it should be even more random.
>> In the above, the blocks of NULs and 1s are always a multiple
>> of $holesize ... which in turn makes it all somehow related
>> to the buffer size and read size in sparse_copy().
>> I think we should use varying block sizes around the block size
>> for the NULs and 1s blocks to be closer to real-world data.
>> WDYT?
>
> Well going below the $holesize would only test is_nul(),
> and that's implicitly tested with the data comparisons.
> Oh I should change from \1 to 'U' since \1 is used as the
> sentinel flag within the code.
>
> I also noticed small issue in copy where an erroneous
> error about extending the file would be output when
> reporting errors when sparse_copy fails. That's fixed
> up is a separate patch. All 4 patches are attached here.
Actually I'll need to rework this set slightly to
keep the last lseek in sparse_copy() as it's needed
for the fiemap case, when there are non empty adjacent
extents that end in NULs.
Pádraig.