rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Delay between EndTime and process end - followu


From: Chris Wilson
Subject: Re: [rdiff-backup-users] Delay between EndTime and process end - followup
Date: Mon, 12 Aug 2013 14:29:18 +0100 (BST)
User-agent: Alpine 2.02 (DEB 1266 2009-07-14)

Hi Laurent,

On Mon, 12 Aug 2013, Laurent De Buyst wrote:

Two weeks ago I sent a mail to this list asking about some strange delays I was seeing between the EndTime as logged byt rdiff-backup and the actual end of the running process.

Since then, I've been digging deeper and I'd like to ask once more if anyone can tell me what's going on and how I might be able to 'fix' this.

See, I ran my backups using -v9 for a few days and this has shown in greater detail when the delay happens.

2013/08/12 06:00:56   Mon Aug 12 05:56:03 2013  Copying inc attrs from ... to 
...
2013/08/12 06:00:56   Mon Aug 12 05:56:03 2013  Setting time of ... to 
1376107515
2013/08/12 06:00:56   Mon Aug 12 05:57:36 2013  Renaming ... to 
.../backup/home/domino/data/webadmin.ntf
...
2013/08/12 06:00:56   Mon Aug 12 06:00:56 2013  Setting time of .../backup/tmp 
to 1376245594
2013/08/12 06:00:56   Mon Aug 12 06:00:56 2013  Copying attributes from () to 
.../backup
2013/08/12 07:17:03   Mon Aug 12 06:00:56 2013  Setting time of .../backup to 
1371199946
2013/08/12 07:17:03   Mon Aug 12 06:01:07 2013  Touching 
.../backup/rdiff-backup-data/extended_attributes.2013-08-12T00:53:28+02:00.snapshot
2013/08/12 07:17:03   Mon Aug 12 06:02:41 2013  Touching 
.../backup/rdiff-backup-data/access_control_lists.2013-08-12T00:53:28+02:00.snapshot
2013/08/12 07:17:03   Mon Aug 12 06:03:26 2013  Writing mirror_metadata diff
2013/08/12 07:17:03   Mon Aug 12 06:04:00 2013  Deleting 
.../backup/rdiff-backup-data/mirror_metadata.2013-08-11T00:55:43+02:00.snapshot.gz
2013/08/12 07:17:03   --------------[ Session statistics ]--------------
[...]
2013/08/12 07:17:03   --------------------------------------------------
2013/08/12 07:17:03   Mon Aug 12 07:17:02 2013  Deleting 
.../backup/rdiff-backup-data/current_mirror.2013-08-11T00:55:43+02:00.data

Now, the first timestamp comes from the wrapper I'm using, while the second 
ones are from rdiff-backup.

The first thing I've noticed even before is that output from rdiff-backup seems to come in bursts. You can see that it's doing things at 05:56 that only get logged at 06:00. This seems to be normal operational mode, I assume there's a cache involved somewhere.

Depending on how your wrapper runs rdiff-backup, if you are feeding the results into a pipe (for example, rdiff-backup | wrapper-script) then they will be buffered by the OS. It's possible to turn that buffering off if you need to. But I would say, just don't trust the timestamps added by your wrapper script. The timestamps from rdiff-backup contain the time when the message was generated.

Bearing that in mind, the delay actually seems to come here:

2013/08/12 07:17:03   Mon Aug 12 06:04:00 2013  Deleting
.../backup/rdiff-backup-data/mirror_metadata.2013-08-11T00:55:43+02:00.snapshot.gz
2013/08/12 07:17:03   --------------[ Session statistics ]--------------
[...]
2013/08/12 07:17:03   --------------------------------------------------
2013/08/12 07:17:03   Mon Aug 12 07:17:02 2013  Deleting
.../backup/rdiff-backup-data/current_mirror.2013-08-11T00:55:43+02:00.data

Since I doubt that the delete operations take a long time (unless you're running on a weird distributed filesystem, or a very heavily loaded fileserver, in which case all bets are off), I'd say the delay is in calculating the session statistics.

I don't know quite what it does, but scanning the code should help you, or strace()ing the process while it's in this delaying mode. It might be stat()ing 48,000 files. It shouldn't take hours to do that, but if it's over a network, with high latency, it might do.

You could run rdiff-backup manually without any pipes, so that you can see the output as it's generated and know when "deleting snapshot.gz" happens. After that, it's calculating session statistics. Then attach strace to the process and see what it's doing and which operations take a long time.

And as long as I don't know exactly what it is, I can't really get to work on this because I don't know if I need more ram, more CPU power, more CPU speed, faster disks, ...

You could look at which resource is most heavily used when the server is running the backup jobs. top can help with that. Is the CPU at 100%? Is it much less than 100%, but you have a high iowait time? Do you have a lot of swap usage? Does the swap usage grow while the backups are running? Does vmstat show more blocks in/out or swap in/out while the backups are running?

Cheers, Chris.
--
_____ __     _
\  __/ / ,__(_)_  | Chris Wilson <address@hidden> Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer |
\__/_/_/_//_/___/ | We are GNU : free your mind & your software |

reply via email to

[Prev in Thread] Current Thread [Next in Thread]