Re: The Verification Treadmill

rdiff-backup-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: The Verification Treadmill

From:	Robert Nichols
Subject:	Re: The Verification Treadmill
Date:	Thu, 15 Feb 2024 14:05:33 -0600
User-agent:	Mozilla Thunderbird

On 2/15/24 09:47, Dominic Raferd wrote:
...snip...

So the only way to be confident about *all* the data in a repository is to use 
'rdiff-backup verify' to verify each and every backup session in each 
repository; and this includes verifying the current 'mirror' session (even 
though it is held in the clear in the repository). This needs to be done with 
reasonable frequency to ensure that backed-up data has not deteriorated (e.g. 
through media bitrot).


That's the way I do it. My verification is done in conjunction with my periodic 
(~weekly) sync of my primary backup archives to separate media. I verify all of 
the new levels that are being synced plus at least one more level to ensure 
that the new levels mesh properly with the ones already synced.

All of which takes a lot of computing power and time, much of which is 
duplication of effort (because, as stated above, the verification of the 
earliest session in a repository confirms the integrity of all later versions 
of files that it contains, but it is not possible to exclude these files from 
re-verification for more recent sessions).


Actually that is not sufficient to verify the intermediate levels. Let's say 
one block of a reverse-diff file for backup level -3 gets corrupted. That's 
going to cause a verification failure for level -3. But, if a diff for level -5 
replaces that same block in the file, then level -5 and all previous levels 
will verify correctly. Only levels -3 and -4 will fail.

There is no substitute for verifying each and every level of the backup 
archive. I have a script that does verification of 8 levels in parallel on a 
system with a lot of memory. Because those threads are for the most part all 
reading the same files, all but the first get that data from the kernel's 
buffer cache and do not incur any I/O delay. I find that 8 threads in parallel 
execute almost as fast as a single thread. I have 64GB of RAM to play with, and 
my machine isn't doing much else while I'm sync-ing backups, so YMMV. Trying to 
do this on a Raspberry PI would be an entirely different story.

--
Bob Nichols     "NOSPAM" is really part of my email address.
                Do NOT delete it.

[Prev in Thread]

Current Thread

[Next in Thread]

The Verification Treadmill, Dominic Raferd, 2024/02/15
- Re: The Verification Treadmill, Robert Nichols <=
  - Re: The Verification Treadmill, Dominic Raferd, 2024/02/16
    - Re: The Verification Treadmill, Robert Nichols, 2024/02/16
    - Re: The Verification Treadmill, Dominic Raferd, 2024/02/18
    - Re: The Verification Treadmill, Robert Nichols, 2024/02/18

Prev by Date: The Verification Treadmill
Next by Date: Re: The Verification Treadmill
Previous by thread: The Verification Treadmill
Next by thread: Re: The Verification Treadmill
Index(es):
- Date
- Thread