rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[rdiff-backup-users] Re: [librsync-users] more info on 25gig files


From: Donovan Baarda
Subject: [rdiff-backup-users] Re: [librsync-users] more info on 25gig files
Date: Fri, 06 May 2005 15:47:10 +1000

On Thu, 2005-05-05 at 20:59 -0700, Ben Escoto wrote:
> >>>>> Donovan Baarda <address@hidden>
[...]
> Ahh rdiff-backup chooses the blocksize to be approximately 1/2000th of
> the length of the file, witha minimum of 512 bytes (see find_blocksize
> in Rdiff.py).  So if the file is 25 gigs large, perhaps Clint could
> try running rdiff with a blocksize of 13421568.
> 
> But I'm not sure how I ever came up with that formula, and probably
> there was no sound reasoning behind it.  Should I switch to the
> square-root thing (minimum blocksize 512, blocksize always multiple of
> 512?)?  I remember there being some discussion about this, but I
> probably never updated rdiff-backup with the correct function.
[...]

There was a long discussion on the rsync lists about the best heuristic
for blocksize and blocksum size (rsync also trims the strongsum size to
make the signature smaller, and it was getting blocksum collisions on
large files). I helped figure out the formula's for blocksize and
blocksum size that rsync now uses.

I forget the exact formula for the blocksum size (something to do with
the ln2 of the number of blocks), but the blocksize was definitely the
sqrt of the filesize. There are lots of reasons why this is a good
heuristic... it compensates for the way lots of things scale in the
rsync algorithm against file size (execution time, probability of
blocksum collision, etc)

So yeah, I'd definitely switch to the sqrt thing, you will get a lot
less grief as files get larger.

> Yes, that is correct, see _librsyncmodule.c.  If you want to test
> rdiff-backup's librsync stuff separately, you may want to check out
> python-rdiff, which is a simple port of rdiff to rdiff-backup's
> librsync extension module.  You can download it from the rdiff-backup
> CVS at:
[...]

I keep meaning to look at that stuff and update stuff in pysync and
librsync. The other one worth looking at that has recently appeared is
zsync. Something to do _after_ moving from Melbourne to Dublin :-)

-- 
Donovan Baarda <address@hidden>
http://minkirri.apana.org.au/~abo/





reply via email to

[Prev in Thread] Current Thread [Next in Thread]