|
From: | Steven Willoughby |
Subject: | Re: [rdiff-backup-users] Problem with Detection of Multiple rdiff-backup instances |
Date: | Thu, 24 Sep 2009 19:16:48 -0600 |
User-agent: | Thunderbird 2.0.0.23 (X11/20090817) |
Dean Cording wrote:
I've come across an issue with the way that rdiff-backup ensures that only one server is accessing a backup dataset.
...
Recently I had a backup fail, probably because of a network outage. All subsequent backups refuse to run because rdiff-backup believes the failed rdiff- backup instance is still running - even though this is clearly impossible because it is a totally different instance of the virtual server.This had me stumped for a while but I finally figured out what is happening.Because I start a new virtual server instance each time and I run the backup from a script, everything happens in a consistent order. As a result the instance of rdiff-backup running on the server for each backup session almost always has the same PID. So when a backup fails, the subsequent backup looks at the metadata, finds the PID of the failed backup and sees that that PID is still running - not realising that the other instance is actually itself.
A cursory look at regress.py seems to confirm this behavior: Specifically in check_pids() it says:
if pid is not None and pid_running(pid): This could say: if pid is not None and pid is not os.getpid() and pid_running(pid):
I'm not sure of a way of working around this problem as the virtual machine is always started from a known state and hasn't been running long enough to build up any entropy to generate unique random numbers between different sessions.
The current time adds a little randomness. A silly workaround would be to call the following perl script before running rdiff-backup:
#!/usr/bin/perl `/bin/true` for 0..int(rand(100));This will increase the pid and should stop your job from failing continuously.
Steven
[Prev in Thread] | Current Thread | [Next in Thread] |