[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: RFC: new cp option: --efficient-sparse=HOW
From: |
Jim Meyering |
Subject: |
Re: RFC: new cp option: --efficient-sparse=HOW |
Date: |
Mon, 31 Jan 2011 23:14:32 +0100 |
Jim Meyering wrote:
> Now that we have can read sparse files efficiently,
> what if I want to copy a 20PiB sparse file, and yet I want to
> be sure that it does so efficiently. Few people can afford
> to wait around while a normal processor and storage system process
> that much raw data. But if it's a sparse file and the src and dest
> file systems have the right support (FIEMAP ioctl), then it'll be
> copied in the time it takes to make a few syscalls.
>
> Currently, when the efficient sparse copy fails, cp falls back
> on the regular, expensive, read-every-byte approach.
>
> This proposal adds an option, --efficient-sparse=required,
> to make cp fail if the initial attempt to read the sparse file fails,
> rather than resorting to the regular (very slow in the above case) copy
> procedure.
>
> The default is --efficient-sparse=auto, and for symmetry,
> I've provided --efficient-sparse=never, in case someone finds
> a reason to want to skip the ioctl.
>
> You can demonstrate this new feature on a tmpfs file system,
> since it supports sparse files, but not the FIEMAP ioctl:
>
> $ cd /dev/shm
> $ truncate -s128K k
> $ cp --efficient=required k kk
> cp: unable to read sparse `k' efficiently
> [Exit 1]
>
> Here's a preliminary patch
> (not including texinfo changes)
> I'll add tests, too, of course.
And NEWS.
Here's that same patch, but now with a proper ChangeLog:
>From c83ea420c64169a7db58189cce6d3e755eb7b717 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Mon, 31 Jan 2011 23:13:36 +0100
Subject: [PATCH] cp: support new option: --efficient-sparse=HOW
Now that we have can read sparse files efficiently,
what if I want to copy a 20PiB sparse file, and yet I want to
be sure that it does so efficiently? Few people can afford
to wait around while a normal processor and storage system process
that much raw data. But if it's a sparse file and the src and dest
file systems have the right support (FIEMAP ioctl), then it'll be
copied in the time it takes to make a few syscalls.
Currently, when the efficient sparse copy fails, cp falls back
on the regular, expensive, read-every-byte approach.
This proposal adds an option, --efficient-sparse=required,
to make cp fail if the initial attempt to read the sparse file fails,
rather than resorting to the regular (very slow in the above case) copy
procedure.
The default is --efficient-sparse=auto, and for symmetry,
I've provided --efficient-sparse=never, in case someone finds
a reason to want to skip the ioctl.
You can demonstrate this new feature on a tmpfs file system,
since it supports sparse files, but not the FIEMAP ioctl:
$ cd /dev/shm
$ truncate -s128K k
$ cp --efficient=required k kk
cp: unable to read sparse `k' efficiently
[Exit 1]
* src/copy.h (enum Sparse_efficiency): Declare.
(struct cp_options) [sparse_efficiency]: New member.
* src/copy.c (word, cp_options_default):
(extent_copy): Add description for a parameter.
(copy_reg): Remember result of src_is_sparse heuristic...
and test that when extent_copy fails.
Don't call extent_copy for SPARSE_EFF_NEVER.
(cp_options_default): Initialize new member.
* src/cp.c (eff_sparse_type, long_opts, main): Support new options.
(usage): Document them.
---
src/copy.c | 32 +++++++++++++++++++++++++-------
src/copy.h | 19 +++++++++++++++++++
src/cp.c | 36 ++++++++++++++++++++++++++++++++++++
3 files changed, 80 insertions(+), 7 deletions(-)
diff --git a/src/copy.c b/src/copy.c
index 04c678d..72425af 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -305,8 +305,8 @@ write_zeros (int fd, uint64_t n_bytes)
copy, and thus makes copying sparse files much more efficient.
Upon a successful copy, return true. If the initial extent scan
fails, set *NORMAL_COPY_REQUIRED to true and return false.
- Upon any other failure, set *NORMAL_COPY_REQUIRED to false and
- return false. */
+ Upon any other failure, give a diagnostic, set *NORMAL_COPY_REQUIRED
+ to false and return false. */
static bool
extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
off_t src_total_size, bool make_holes,
@@ -931,6 +931,7 @@ copy_reg (char const *src_name, char const *dst_name,
/* Deal with sparse files. */
bool make_holes = false;
+ bool src_is_sparse = false;
if (S_ISREG (sb.st_mode))
{
/* Even with --sparse=always, try to create holes only
@@ -943,9 +944,13 @@ copy_reg (char const *src_name, char const *dst_name,
blocks. If the file has fewer blocks than would normally be
needed for a file of its size, then at least one of the blocks in
the file is a hole. */
+
if (x->sparse_mode == SPARSE_AUTO && S_ISREG (src_open_sb.st_mode)
&& ST_NBLOCKS (src_open_sb) < src_open_sb.st_size /
ST_NBLOCKSIZE)
- make_holes = true;
+ {
+ make_holes = true;
+ src_is_sparse = true;
+ }
#endif
}
@@ -977,18 +982,30 @@ copy_reg (char const *src_name, char const *dst_name,
buf_alloc = xmalloc (buf_size + buf_alignment_slop);
buf = ptr_align (buf_alloc, buf_alignment);
- bool normal_copy_required;
+ bool normal_copy_required = true;
/* Perform an efficient extent-based copy, falling back to the
standard copy only if the initial extent scan fails. If the
'--sparse=never' option is specified, write all data but use
any extents to read more efficiently. */
- if (extent_copy (source_desc, dest_desc, buf, buf_size,
- src_open_sb.st_size, make_holes,
- src_name, dst_name, &normal_copy_required))
+
+ if (x->sparse_efficiency != SPARSE_EFF_NEVER
+ && extent_copy (source_desc, dest_desc, buf, buf_size,
+ src_open_sb.st_size, make_holes,
+ src_name, dst_name, &normal_copy_required))
goto preserve_metadata;
if (! normal_copy_required)
{
+ /* extent_copy already diagnosed the failure */
+ return_val = false;
+ goto close_src_and_dst_desc;
+ }
+
+ /* extent_copy failed, and we are instructed not to fall-back */
+ if (src_is_sparse && x->sparse_efficiency == SPARSE_EFF_REQUIRED)
+ {
+ error (0, 0, _("unable to read sparse %s efficiently"),
+ quote (src_name));
return_val = false;
goto close_src_and_dst_desc;
}
@@ -2519,6 +2536,7 @@ cp_options_default (struct cp_options *x)
#else
x->chown_privileges = x->owner_privileges = (geteuid () == 0);
#endif
+ x->sparse_efficiency = SPARSE_EFF_AUTO;
}
/* Return true if it's OK for chown to fail, where errno is
diff --git a/src/copy.h b/src/copy.h
index 5014ea9..fab131b 100644
--- a/src/copy.h
+++ b/src/copy.h
@@ -22,6 +22,22 @@
# include <stdbool.h>
# include "hash.h"
+/* Control efficient reading of sparse files. On some systems, you can
+ use the FIEMAP ioctl to read only the non-sparse parts of a file. */
+enum Sparse_efficiency
+{
+ /* Do not attempt to treat sparse source files specially. */
+ SPARSE_EFF_NEVER,
+
+ /* Attempt to read sparse files efficiently, but if that is not
+ possible, fall back on the regular, less-efficient approach. */
+ SPARSE_EFF_AUTO,
+
+ /* Read sparse files efficiently, and if that is not possible,
+ then treat it as failure to copy. */
+ SPARSE_EFF_REQUIRED
+};
+
/* Control creation of sparse files (files with holes). */
enum Sparse_type
{
@@ -110,6 +126,9 @@ struct cp_options
/* Control creation of sparse files. */
enum Sparse_type sparse_mode;
+ /* Control efficient reading of sparse files. */
+ enum Sparse_efficiency sparse_efficiency;
+
/* Set the mode of the destination file to exactly this value
if SET_MODE is nonzero. */
mode_t mode;
diff --git a/src/cp.c b/src/cp.c
index 859f21b..711e229 100644
--- a/src/cp.c
+++ b/src/cp.c
@@ -74,6 +74,7 @@ enum
{
ATTRIBUTES_ONLY_OPTION = CHAR_MAX + 1,
COPY_CONTENTS_OPTION,
+ EFFICIENT_SPARSE_OPTION,
NO_PRESERVE_ATTRIBUTES_OPTION,
PARENTS_OPTION,
PRESERVE_ATTRIBUTES_OPTION,
@@ -93,6 +94,16 @@ static bool parents_option = false;
/* Remove any trailing slashes from each SOURCE argument. */
static bool remove_trailing_slashes;
+static char const *const eff_sparse_type_string[] =
+{
+ "never", "auto", "required", NULL
+};
+static enum Sparse_type const eff_sparse_type[] =
+{
+ SPARSE_EFF_NEVER, SPARSE_EFF_AUTO, SPARSE_EFF_REQUIRED
+};
+ARGMATCH_VERIFY (eff_sparse_type_string, eff_sparse_type);
+
static char const *const sparse_type_string[] =
{
"never", "auto", "always", NULL
@@ -120,6 +131,7 @@ static struct option const long_opts[] =
{"backup", optional_argument, NULL, 'b'},
{"copy-contents", no_argument, NULL, COPY_CONTENTS_OPTION},
{"dereference", no_argument, NULL, 'L'},
+ {"efficient-sparse", required_argument, NULL, EFFICIENT_SPARSE_OPTION},
{"force", no_argument, NULL, 'f'},
{"interactive", no_argument, NULL, 'i'},
{"link", no_argument, NULL, 'l'},
@@ -177,6 +189,9 @@ Mandatory arguments to long options are mandatory for short
options too.\n\
-d same as --no-dereference --preserve=links\n\
"), stdout);
fputs (_("\
+ --efficient-sparse=HOW control efficient reading of sparse files.\n\
+"), stdout);
+ fputs (_("\
-f, --force if an existing destination file cannot be\n\
opened, remove it and try again (redundant if\
\n\
@@ -247,6 +262,21 @@ fails, or if --reflink=auto is specified, fall back to a
standard copy.\n\
"), stdout);
fputs (_("\
\n\
+By default, cp tries to read sparse SOURCE files efficiently, but if the\n\
+required capability is not available it resorts to copying the usual way.\n\
+--efficient-sparse=auto is the default. One case in which you would not\n\
+want to fall back on the usual method is when you are copying a very large,\n\
+mostly-sparse file, and processing all bytes in the nominal size would take\n\
+too long.\
+"), stdout);
+ fputs (_("\
+ In that case, use --efficient-sparse=required to make cp fail if\n\
+the efficient method does not work. I.e., tell cp not to resort to the\n\
+less-efficient method. Finally, --efficient-sparse=never makes cp skip the\n\
+attempt to copy efficiently.\n\
+"), stdout);
+ fputs (_("\
+\n\
The backup suffix is `~', unless set with --suffix or SIMPLE_BACKUP_SUFFIX.\n\
The version control method may be selected via the --backup option or
through\n\
the VERSION_CONTROL environment variable. Here are the values:\n\
@@ -944,6 +974,12 @@ main (int argc, char **argv)
sparse_type_string, sparse_type);
break;
+ case EFFICIENT_SPARSE_OPTION:
+ x.sparse_efficiency = XARGMATCH ("--efficient-sparse", optarg,
+ eff_sparse_type_string,
+ eff_sparse_type);
+ break;
+
case REFLINK_OPTION:
if (optarg == NULL)
x.reflink_mode = REFLINK_ALWAYS;
--
1.7.3.5.44.g960a