[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [rdiff-backup-users] Q. on max-file-size behavior
From: |
Maarten Bezemer |
Subject: |
Re: [rdiff-backup-users] Q. on max-file-size behavior |
Date: |
Sun, 14 Mar 2010 22:27:33 +0100 (CET) |
On Sun, 14 Mar 2010, Whit Blauvelt wrote:
On Sun, Mar 14, 2010 at 03:31:13PM +0100, Maarten Bezemer wrote:
I don't think this is even a corner case. If you want to exclude
large files, then a file that is larger than the limit you specify
(something you explicitly and deliberatly do!) should not be in the
backup. Also, it should not _remain_ in the 'current' backup tree,
because it would no longer match the original in the source tree.
Since rdiff-backup keeps history of the backups, there is no other
way than to treat it as 'deleted from the source'. That's the only
way to keep the history intact AND have a proper 'current' backup
tree.
Here's how the corner case occurs:
[snip]
I do understand when your 'problem case' happens. Not only would it happen
when you lower the maximum file size in later runs, it would also happen
when you have files steadily growing over the size limit.
IF you tell rdiff-backup "I do not want files larger than X in my backup",
then clearly all rdiff-backup can do is... not include them in the backup.
There is no difference between "I don't want them" and "They don't exist",
as far as the backup application is concerned. You don't want them? Fine,
you don't get them. But you also don't get an older version since that
would make no sense either.
Quoting from the manpage:
" When backing up, if a file is excluded, rdiff-backup acts as if that
file does not exist in the source directory."
As far as intact history goes, that's a side issue here, isn't it?
No, it's not. That's the whole point.
If rdiff-backup didn't keep history, it could just remove the large file
and be done with it. However, rdiff-backup was designed to be able to
restore to previous points in time, for example to just before your
manager accidentally removed the almost-finished $200.000 tender document
that was due tomorrow.
So, files that are no longer in the source tree (or files that you have
excluded, either by name or by a size limit, no difference there) are not
just deleted, but rdiff-backup creates a so-called snapshot and moves that
to a proper place in the rdiff-backup-data directory. So, if you do need
that file again, it only needs to restore the snapshot.
That snapshot can later be deleted when you decide to remove parts of the
history kept by rdiff-backup. (--remove-older-than)
The normal 'current' backup tree always contains the exact same files as
the source tree. Rdiff-backup does never gzip files in the current tree.
Only the snapshots and diffs in the rdiff-backup-data directory can, at
the user's choice, be gzipped.
But..
I think your problem is not with the gzipping. I think you want to use
rdiff-backup in a way it was never designed to be used. So, instead of
commenting on several other "misunderstandings" in your email, I'll focus
on what I think triggered this discussion:
That might not just avoid treating a file as if deleted on the original when
it hasn't been, but support actions like running rdiff-backup at regular
intervals during working hours just against smaller files, while running a
daily backup of even the large stuff every night, without having to
establish two redundant backup spaces to accommodate this.
That's just a Bad Idea (tm). The whole idea of "restore to a specific
point in time" implies that you then get back the tree as it was at the
time you specified. Not a tree with small files from that date/time, and
with large files from an earlier date.
You do have a few options to get what you want.
For example, you could do a two-stage backup, using rsync to regularly
sync the source tree to a shadow tree, and exclude-but-not-delete large
files. And then use rdiff-backup to backup the shadow tree right after
each rsync run. Overnight, run a full rsync and again a normal
rdiff-backup, and it will update the larger files as well.
This indeed uses a lot of extra disk space and thus sort of defeats the
purpose.
So, why not just use both --max-file-size and --min-file-size on two
separate backup trees? That would exclude the large files from the
smallfiles-tree, and the small files from the largefiles-tree, so no
redundancy. And you can use different backup schedules for both trees.
To make things more easy, I think I'd just create two backup trees, based
on file paths. Huges files with sizes like you mentioned usually show up
on well-defined places in a file system, and not just between a normal
user's mozilla preferences file and a list of recently opened documents.
So you could even use a --max-file-size for the normal backup tree, and
warn the users that they CAN use larger files there, but they will NOT be
backed up so no complaining if they get deleted, corrupted, or lost.
Good points. But let me rephrase the claims more clearly. (Language can be
too broad a brush for technical discussions.) If the user's goal is to
compromise [snip]
If you want to compromise, you don't get what you want, and also you get
things you don't want. That's not only a matter of language, it's just
something you don't want when designing a backup system. If you want
speed (assuming, for the sake of argument, that gzipping is your only
problem), just get larger disks. Extra 1TB of disk space costs way less
than changing rdiff-backup to something it was never designed to be.
Plus, gzipping might indeed take eons to complete on a 16GB file, but your
suggestion wouldn't do anything to improve the speed of:
- the part where librsync creates a local copy of the current version of
the file in the source tree
- the part where a diff is created to be able to go from the current
version to the previous version
- the part where that possibly large diff is stored into the
rdiff-backup-data directory.
(Where the first two might very well take even more time than gzipping the
file..)
Actually, your suggestion would only help for large files being deleted
(or excluded) from the source tree. For your suggestion to be really
useful, you would need to have a source tree that has this happening on a
regular basis. And in that case, the time spent in gzipping will be so
much less of a problem than the amount of disk space that will be used by
all the increments. (Or you would need to keep such a short history that
you shouldn't be using rdiff-backup at all, making this discussion moot
anyway.)
Maarten
- [rdiff-backup-users] Q. on max-file-size behavior, Whit Blauvelt, 2010/03/13
- Re: [rdiff-backup-users] Q. on max-file-size behavior, Whit Blauvelt, 2010/03/13
- Re: [rdiff-backup-users] Q. on max-file-size behavior, Jernej Simončič, 2010/03/13
- Re: [rdiff-backup-users] Q. on max-file-size behavior, Josh Nisly, 2010/03/13
- Re: [rdiff-backup-users] Q. on max-file-size behavior, Whit Blauvelt, 2010/03/13
- Re: [rdiff-backup-users] Q. on max-file-size behavior, Jernej Simončič, 2010/03/13
- Re: [rdiff-backup-users] Q. on max-file-size behavior, Whit Blauvelt, 2010/03/13
- Re: [rdiff-backup-users] Q. on max-file-size behavior, Maarten Bezemer, 2010/03/14
- Re: [rdiff-backup-users] Q. on max-file-size behavior, Whit Blauvelt, 2010/03/14
- Re: [rdiff-backup-users] Q. on max-file-size behavior,
Maarten Bezemer <=
- Re: [rdiff-backup-users] Q. on max-file-size behavior, Whit Blauvelt, 2010/03/14