coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[coreutils] Re: [Bug-tar] [PATCH] improved sparse file detection


From: Eric Blake
Subject: [coreutils] Re: [Bug-tar] [PATCH] improved sparse file detection
Date: Tue, 24 Aug 2010 16:45:19 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.8) Gecko/20100806 Fedora/3.1.2-1.fc13 Mnenhy/0.8.3 Thunderbird/3.1.2

[adding coreutils]

On 08/24/2010 09:17 AM, Bernd Schubert wrote:
Hi all,

for improved stat() performance the Lustre filesystem uses entirely empty
sparse files on its metadata target (MDT). Now with hundredes of millions of
sparse file of huge sizes, creating a backup of of the MDT using vanilla
gnu-tar is basically impossible, as it needs far too much time to detect
sparse files.

Coreutils cp(1) has recently started using code to efficiently iterate over the locations of all holes within sparse files, with the goal of eventually being able to target both Linux ioctls and Solaris SEEK_HOLE directives. I think that could also be leveraged rather nicely for tar's detection of sparse files, by stopping the iteration after the first hole has been found; in particular, it would rapidly detect files that are not completely sparse (whereas the description of your patch implies that you only address the subset of quickly detecting a completely sparse file, but offer no speedup on partially sparse files). Thus, coreutils' sparse file management is a great candidate for migrating into gnulib and sharing among several projects.

Meanwhile, if you are indeed correct that there are easy ways to detect completely sparse files, even when the ioctl or SEEK_HOLE directives are not present, then the coreutils cp(1) hole iteration routine should probably be taught that corner case to recognize an entirely sparse file as a single hole.

PS: I'm used to linux-style indentation and I'm not sure if I did it the right
way. If it is wrong, please complain and I will try to reformat it.

Thanks for taking the time to contribute a patch. However, the diffstat says that your patch is large enough to fall outside the bounds of trivial submissions, so I quit reading it to avoid any copyright issues. Would you be willing to assign copyright to the FSF? If so, we can start the paperwork process off-list.

--
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]