rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Interesting write-up of 'compare-by-hash'.


From: Ben Escoto
Subject: Re: [rdiff-backup-users] Interesting write-up of 'compare-by-hash'.
Date: Fri, 30 Jan 2004 14:07:45 -0800

>>>>> Greg Freemyer <address@hidden>
>>>>> wrote the following on Wed, 28 Jan 2004 14:08:05 -0500

> Since the hashing process is lossy (ie. non-reversable), then it is
> possible that two totally different data sets could generate the same
> hash, and in turn invalidate the backup checksum check.
> 
> I don't know what the odds are of this happening with rdiff-backup.
> 
> I assume that they are exceedingly small, but not zero.

For rdiff (and thus rdiff-backup) it actually depends on the number of
blocks in the file, because there is no global sha1 or md5 hash.  So a
2GB file that has all 2GB changed is more likely to cause a hash
collision than a changed 1k file.

I remember asking Donovan Baarda about this on the librsync list a while
ago, so if anyone is curious for more details they can look that up. The
upshot IIRC is that (for "random data") the odds of a collision are
around 2^-50 even for fairly large files.  This isn't as good as a 128
global hash, but quite reasonable for practical use.


-- 
Ben Escoto

Attachment: pgp1rK_M1R1tE.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]