rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] multiple archive hardlink space saving?


From: Maarten Bezemer
Subject: Re: [rdiff-backup-users] multiple archive hardlink space saving?
Date: Tue, 18 Aug 2009 22:53:08 +0200 (CEST)

Hi,

On Tue, 18 Aug 2009, Joshua Jensen wrote:

I ask because I have 2000 (yes, two thousand) machines to back up.  At
least 90% of data is the same... basic Linux filesystems, mostly / and
/var and /usr partitions.

Are all of those 2000 machines critical to the day-to-day operation? I mean, I did run a computer lab (admittedly not with 2000 machines) with Linux boxes, and I never used backups for the individual machines. If a hard drive broke down, it got replaced and the machine installed from a central image (tgz). And updates also being rolled out centrally, using patch levels so we could recreate an image after a number of patches and only apply newer patches for machines that were imaged later on. All user's files were stored on the network, no user data on the machines themselves, only just stuff like /tmp, /var/tmp (with different clean-up policies).

If you're saying that at least 90% on all machines is the same, then these files should not be backed up using rdiff-backup. Do you use some kind of package management software? (dpkg, rpm) Then you could find a way to save the current state as far as package management goes, and only save files that are not managed by package management (think /etc and /usr/local). No point in incrementally backing up 2000 copies of a Linux distro...


Having said that, it might make sense to make rdiff-backup a bit smarter. But most improvements would probably also require changes to librsync, since that's the part that does most of the work. As for hard links, what are you suggesting? Relying on directory tree structure might be useable for you, but would it help other people? Or maybe keep a list of files and checksums and see if there's a matching file? That would mean an extra level of bookkeeping, probably prone to errors AND with practical issues when doing multiple concurrent rdiff-backup runs to different subdirectories. (Which you probably want to do, since backing up 2000 machines sequentially will result in a backup interval that's way to long to be useful anyway.)


Regards,
 Maarten




reply via email to

[Prev in Thread] Current Thread [Next in Thread]