[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [rdiff-backup-users] Restarting development ... or starting over
From: |
Daniel Miller |
Subject: |
Re: [rdiff-backup-users] Restarting development ... or starting over |
Date: |
Mon, 5 Apr 2010 14:25:27 -0400 |
I wasn't really prepared to make this announcement so soon, but now seems like
a good time to let the community know. I've been working on a new
implementation of rdiff-backup since about a month ago when I dug into the
current codebase and discovered its disappointing quality. While what I have
right now is functional and works on simple cases, it does not cover the broad
range of features currently offered by rdiff-backup. I could use some help in
bringing it up to par if others are interested in the path I have taken. While
I have used the current codebase for direction and inspiration, I have started
with a clean slate for several reasons:
- An automated test suite makes adding new features and long-term maintenance
much easier. Adding this to the current codebase is both hard and boring. One
thing that makes it very hard to write tests for the current codebase is the
widespread use of globals. My new implementation has been developed using TDD
and minimal use of globals (e.g. for loggers and constants).
- The current repository layout has a critical design flaw that causes
performance degradation as a repository grows. Most difference information is
stored in a single file tree (rdiff-backup-data/increments), that has a very
similar structure to the mirror. The problem is that as files get
added/deleted/changed the directories in the increments tree are always growing
in size, meaning it takes longer and longer to list the contents of directories
in the tree. This performance problem is negligible in small-to-medium sized
backup sets, but becomes apparent in very large backup sets as the number of
increments grows. I have redesigned the repository layout in my new
implementation to eliminate this performance issue. Note that I do not know for
sure if my new layout will completely eliminate this problem since I have not
tested it yet with a very large backup set over a long period of time.
- When the current version of rdiff-backup fails it often aborts completely,
leaving the repository in a state that needs to be rolled back to the previous
backup state in order to continue using it. While this is a good conservative
approach, it potentially results in the loss of difference data that could
otherwise be saved. I have designed my new version to recover better from
errors--simply logging unexpected errors and skipping the current task rather
than aborting completely. I also have plans to make it possible to retain
incremental data from a failed backup rather than simply discarding it.
- There is currently no (efficient) way to do a complete verification of all
data in a repository. My new version was designed with this as a requirement.
- Although it is not implemented yet, I have some ideas of how to make use of
multiple cores to speed up rdiff-backup once the initial backup has been
created. Backups after the first one (which is usually IO bound) are often CPU
bound; using multiple cores could help to speed up backups.
- Another thing that is planned, but not implemented yet is the ability to
remove all traces of selected files from a backup repository. This should be a
built-in feature of rdiff-backup since it is a common occurrence to have to
remove files that were backed up by mistake. Currently it is only possible to
do this by hand (very error prone) which I find unacceptable.
- Did I mention that the new version has been developed from the ground up with
full unicode support?
Please note that this new version will obviously not be backward compatible
with older rdiff-backup repositories. While a tool could conceivably be written
to convert an old repository to the new format, I have no desire to do so, and
I doubt that anyone else will either...
I have developed this new version using git for version control, which I plan
to continue using. I am hoping to put it up on github soon.
~ Daniel
- [rdiff-backup-users] Restarting development, Josh Nisly, 2010/04/05
- Re: [rdiff-backup-users] Restarting development ... or starting over,
Daniel Miller <=
- Re: [rdiff-backup-users] Restarting development ... or starting over, Matthew Miller, 2010/04/05
- Re: [rdiff-backup-users] Restarting development ... or starting over, Daniel Miller, 2010/04/05
- Re: [rdiff-backup-users] Restarting development ... or starting over, Willard Korfhage, 2010/04/05
- Re: [rdiff-backup-users] Restarting development ... or starting over, Daniel Miller, 2010/04/05
- Re: [rdiff-backup-users] Restarting development ... or starting over, Whit Blauvelt, 2010/04/05
- Re: [rdiff-backup-users] Restarting development ... or starting over, Willard Korfhage, 2010/04/05
Re: [rdiff-backup-users] Restarting development ... or starting over, Nicolas Jungers, 2010/04/06
Re: [rdiff-backup-users] Restarting development ... or starting over, Randy Syring, 2010/04/06