[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[rdiff-backup-users] Wiki additions...
From: |
listserv . traffic |
Subject: |
[rdiff-backup-users] Wiki additions... |
Date: |
Thu, 26 Mar 2009 10:04:27 -0700 |
Since these issues seem to come up often, and they were issues I was
most concerned about, I thought I'd place them on the wiki.
Here's a draft: Comments welcome before I add them.
Did I state anything unclearly, wrong etc. Please save me from later
shame! :)
-Greg
---
Question:
What happens when I have to restore a file that has many reverse diffs to apply
to it?
It will take the current version of the file and use the meta-data stored to
tell it how to apply all the reverse differences files that apply back to the
date you requested, provided it exists.
Answer:
It has to have all three parts:
1) The current version of the file as it existed when RDiff-Backup was last
run.
2) The meta data that tells the system if/when/how to apply the reverse diffs.
3) All the reverse diffs themselves.
Question:
Does the system have to restore all the reverse diffs for a file? What if there
are dozens or even hundreds?
What if only one is broken, is the whole process or "restoring" the file broken?
Answer:
Yes, the system has to apply all the reverse diffs that apply to the "version"
of the file you requested. If there were 200 reverse diffs, because the file
had changed over 200 rdiff-backup sessions, yes it will have to apply all 200
reverse diffs to get to the version of the file you want. If any of the three
parts of the system, current file, meta-data, or reverse diffs are missing, the
process will break, and you won't get your file.
(There are ways to attempt to manually salvage the file, but these are far
outside the scope of this document. Suffice it to say, that if any of the parts
(file/meta-data/rdiffs) needed are missing, RDiff-Backup isn't going to be able
to restore it automatically, and all bets are off. You'll be in deep weeds and
if you're lucky you might be able to get parts of your data back. Perhaps if
you're really super lucky and the missing reverse diffs overlap others *and*
you can finagle the restore process, you might get everything back. Or, if it's
just not your day, you won't get jack, you'll get fired, your dog will bite you
and you'll get rabies...)
Question:
*Isn't it dangerous to have to rely on all those reverse diffs, especially when
they're being applied serially, and every single one of the reverse diffs has
to apply properly, in order to get back to the version I want?
Answer:
Yes, it is "dangerous" - though every definition of dangerous depends on your
perspective. (Just ask a BASE jumper about what's considered dangerous.) The
design decision was to only keep the differences and because of limitations in
the rsync libraries it's impossible to merge rdiffs. While we're certainly not
trying to convince you to use RDiff-Backup and agree with our reasoning on
what's best and reasonable, we think reasonable trade-offs were made on
managing the resources used vs the advantages of redundancy.
Question:
OK, I like most of what I hear, but how can I be sure the whole system retains
it's integrity? Is there a way to test all the parts of the system and make
sure they all work, and work properly. For example, can I have the system "self
test" the archive and let me know if any parts of it fail.
Answer:
Certainly. The "--verify-at-time xyz" switch is your friend. This switch, in
essence does a full restore of the file to the time specified in "xyz." In
brief, it takes the current version of the file, and then uses the meta-data
and applicable reverse diffs to roll the file back to the date specified. (i.e.
xyz) It then re-calculates the SHA-1 hash for the re-created file. It then
checks that newly calculated SHA-1 hash with the SHA-1 hash it stored for this
file when it was backed up back on the date that corresponds with "xyz."
If any part of the process fails, rdiff-backup will exit with a non-zero
result. (And it should generate errors to the console...)
If meta-data is damaged, and it can't figure out how to apply the rdiffs, you
should get an error message.
If after rolling the file backward to date xyz, the check-sums don't match,
you'll get an error.
Thus, to test the integrity of every piece of the system, pick a date for "xyz"
that is at least as old as the oldest rdiff session. This should, by
requirement, apply every reverse diff in the repository and all the meta-data.
While a successful results of a "--verify-at-time xyz" isn't sufficient to
ensure that someone hasn't tampered with the rdiff-repository in an attempt,
for example, to modify executable files - it is very strong evidence that
chance or bad-luck hasn't damaged the system. Random collisions for the same
file in the SHA-1 checksum are vanishingly small. (i.e. Two very similar files
having the same SHA-1 checksum but not being equal, by simple chance (not
malicious design), is exceedingly unlikely.)
Here's how R-DB works
RDiff-Backup "mirrors" the backed-up files, and for files that have changed
since the last "backup" it creates reverse diffs.
So, for a respository that covers week-day backups, once daily for a year, 200
diffs a year...
---
To roll-back a file, you'll need a good current version (i.e. that matches the
file at the time of the last RDiff-Backup.) and all the RDiffs, back to the
time RDiff made the target-date archive.
(i.e. You have a year of RDiff-backups, with 200 versions/diffs. You want a
file from a year ago that changed every single RDiff-Backup run [and thus has
changes in every single RDiff.]
To restore that file, will require the current version of the file as it was on
the last RDiff-Backup run, and every single RDiff archive will need to be valid
and uncorrupted to guarantee a sucessful restore.
Possible methods of verifying the integrity of the RDiff-backup archive...
---
Rough check of the archive...
You can probably ascertain the integrity of the exiting RDiff's by checking the
integrity of the .gz files. [If the GZ is uncorrupted, it's likely the RDiff it
contains is OK too.]
However this doesn't guarantee that all the correct increment data is
available. It just verifies that what data IS there, it is *probably* not
corrupted.
The situation is a bit more complex than the above explaination, since just
because you have, for example, the GZ file for the diff doesn't mean you have
all the required pieces to apply it properly since there are more files
required to tell RDiff how and what to restore than just the RDiff file...
A simple answer that you can be fairly certain of: If you haven't deleted or
modified *any* of the files in the RDiff repository, and all the GZ files pass
integrity checks you're probably OK.
However, deleting or modifying ANY files in the repository will have serious
negative consequences for restoring files in that repository. You might still
be able to do so, but it will require you to hand-edit or create the pieces to
fake RDiff into doing the restore.
---
If you want to do a more complete test, you can do a "dry" restore back to the
earliest version in the repository.
This should apply all the available diffs to all appropriate files.
If there's a problem doing so, RDiff-Backup should throw an error, and by
examining the error you should be able to determine the problem.
[Or alternatively doing a restore to the earliest critical version. i.e. You
have a years worth of rdiffs, but only 90 days are critical. Doing a restore to
90 days ago would test the most critical pieces of the archive, and would be
less time and compute intensive than doing a full year.]
However, doing a "full" restore could consume a lot of disk-space and will be
time and compute intensive.
[Does anyone want to give an estimate of how time intensive this might be -
local disk to local disk?]
---
- [rdiff-backup-users] Wiki additions...,
listserv . traffic <=