Hello all:
I have a series of rdiff-backups that run every day to backup 10 remote sites
and total of 14 different servers. It seems that each day, at least one of
the backups fails. I have been working at getting these to work flawlessly
for 6 months, but it seems beyond my grasp. In the last week, I thought I was
hot on the trail of a 'perfect run' but now I'm not so sure. For the past few
days I have been having troubles with the same two servers' backups. These
are push type backups (as are all my backup jobs) with the remote servers
running backup scripts to rdiff to, in this case, two different
destination/backup servers.
In the first case: older Gentoo Linux system (running v1.0.5, dest. has
v1.0.4) the following commands:
rdiff-backup --force --print-statistics --include /etc --include /home
--include /var --include /root --exclude / /
root@<servername>::/home/backups/dor
rdiff-backup --force --remove-older-than 2M
root@<servername>::/home/backups/dor
(I added the --force option to test if that would clear up the regression
problem, it didn't)
returned this as output:
Previous backup seems to have failed, regressing destination now.
Traceback (most recent call last):
File "/usr/bin/rdiff-backup", line 23, in <module>
rdiff_backup.Main.Main(sys.argv[1:])
File "/usr/local/lib/python2.5/site-packages/rdiff_backup/Main.py", line
285, in Main
take_action(rps)
File "/usr/local/lib/python2.5/site-packages/rdiff_backup/Main.py", line
255, in take_action
elif action == "backup": Backup(rps[0], rps[1])
File "/usr/local/lib/python2.5/site-packages/rdiff_backup/Main.py", line
299, in Backup
backup_final_init(rpout)
File "/usr/local/lib/python2.5/site-packages/rdiff_backup/Main.py", line
396, in backup_final_init
checkdest_if_necessary(rpout)
File "/usr/local/lib/python2.5/site-packages/rdiff_backup/Main.py", line
911, in checkdest_if_necessary
dest_rp.conn.regress.Regress(dest_rp)
File "/usr/local/lib/python2.5/site-packages/rdiff_backup/connection.py",
line 445, in __call__
return apply(self.connection.reval, (self.name,) + args)
File "/usr/local/lib/python2.5/site-packages/rdiff_backup/connection.py",
line 367, in reval
if isinstance(result, Exception): raise result
IOError: [Errno None] None: None
Traceback (most recent call last):
File "/usr/bin/rdiff-backup", line 23, in ?
rdiff_backup.Main.Main(sys.argv[1:])
File "/usr/lib/python2.4/site-packages/rdiff_backup/Main.py", line 285, in
Main
take_action(rps)
File "/usr/lib/python2.4/site-packages/rdiff_backup/Main.py", line 253, in
take_action
connection.PipeConnection(sys.stdin, sys.stdout).Server()
File "/usr/lib/python2.4/site-packages/rdiff_backup/connection.py", line
352, in Server
self.get_response(-1)
File "/usr/lib/python2.4/site-packages/rdiff_backup/connection.py", line
314, in get_response
try: req_num, object = self._get()
File "/usr/lib/python2.4/site-packages/rdiff_backup/connection.py", line
230, in _get
raise ConnectionReadError("Truncated header string (problem "
rdiff_backup.connection.ConnectionReadError: Truncated header string (problem
probably originated remotely)
At some point recently (3/20), this backup worked. Then it started to fail,
giving up regressing dest. errors each time it has run since then. This is
the same backup I posted on recently where I had to 'pull the wool' over
rdiff's eyes because of a server date malfunction. The work around seemed to
be the renaming of the current meta-data file to a time prior to the next run
of rdiff. That seemed to work in that it didn't complain about too many
current mirror files, but it did make rdiff unable to 'see' the metadata file
and therefore use the filesystem. Perhaps these problems are then related? If
so, any ideas on how to get it working again would be greatly appreciated.
There should be two months of increments stored in the repository so I don't
want to lose those by starting over.
The second failed backup is a brand new install of ubuntu 8.10 running rdiff
v1.1.16 pushing backups to another fresh 8.10 install also running rdiff
v1.1.16. Using the following commands:
rdiff-backup --force --print-statistics --exclude-special-files --include
/etc --include /home --include /var/www --exclude /var --include /root
--exclude / / root@<servername2>::/home/backups/images2
rdiff-backup --force --remove-older-than 2M
root@<servername2>::/home/backups/images2
(again, I added the --force options to see if it would not regress...)
returned this output:
Previous backup seems to have failed, regressing destination now.
Exception 'CRC check failed' raised of class '<type 'exceptions.IOError'>':
File "/var/lib/python-support/python2.5/rdiff_backup/Main.py", line 302, in
error_check_Main
try: Main(arglist)
File "/var/lib/python-support/python2.5/rdiff_backup/Main.py", line 322, in
Main
take_action(rps)
File "/var/lib/python-support/python2.5/rdiff_backup/Main.py", line 278, in
take_action
elif action == "backup": Backup(rps[0], rps[1])
File "/var/lib/python-support/python2.5/rdiff_backup/Main.py", line 341, in
Backup
backup.Mirror_and_increment(rpin, rpout, incdir)
File "/var/lib/python-support/python2.5/rdiff_backup/backup.py", line 51, in
Mirror_and_increment
DestS.patch_and_increment(dest_rpath, source_diffiter, inc_rpath)
File "/var/lib/python-support/python2.5/rdiff_backup/connection.py", line
447, in __call__
return apply(self.connection.reval, (self.name,) + args)
File "/var/lib/python-support/python2.5/rdiff_backup/connection.py", line
369, in reval
if isinstance(result, Exception): raise result
Traceback (most recent call last):
File "/usr/bin/rdiff-backup", line 23, in <module>
rdiff_backup.Main.error_check_Main(sys.argv[1:])
File "/var/lib/python-support/python2.5/rdiff_backup/Main.py", line 302, in
error_check_Main
try: Main(arglist)
File "/var/lib/python-support/python2.5/rdiff_backup/Main.py", line 322, in
Main
take_action(rps)
File "/var/lib/python-support/python2.5/rdiff_backup/Main.py", line 278, in
take_action
elif action == "backup": Backup(rps[0], rps[1])
File "/var/lib/python-support/python2.5/rdiff_backup/Main.py", line 341, in
Backup
backup.Mirror_and_increment(rpin, rpout, incdir)
File "/var/lib/python-support/python2.5/rdiff_backup/backup.py", line 51, in
Mirror_and_increment
DestS.patch_and_increment(dest_rpath, source_diffiter, inc_rpath)
File "/var/lib/python-support/python2.5/rdiff_backup/connection.py", line
447, in __call__
return apply(self.connection.reval, (self.name,) + args)
File "/var/lib/python-support/python2.5/rdiff_backup/connection.py", line
369, in reval
if isinstance(result, Exception): raise result
IOError: CRC check failed
Fatal Error: Lost connection to the remote system
Seems like the last line is a big issue. Is there any further descriptor to
be had for the lost connection error (I've tried running rdiff with both -v5
and -v7 levels but neither seemed to give me any more info on the lost
connection error, and this error does re-occur on each successive running)?
This backup data (241GB) set took several tries to get to run properly,
however it did complete successfully on 3/23 (after running for 23 hours to
complete). Since then it has thrown up the 'previous backup seems to have
failed, regressing destination' errors each time. I have the network almost
to myself this week, so there's not a lot of extra traffic impeding packet
flow and no obvious reasons for a lost connection error (i.e. - the link has
not seemed to go down [at least not that cacti or nagios noticed]).
Thanks in advance for any help on either of these.
~bob