[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [rdiff-backup-users] Delay between EndTime and process end - followu
From: |
Chris Wilson |
Subject: |
Re: [rdiff-backup-users] Delay between EndTime and process end - followup |
Date: |
Mon, 12 Aug 2013 14:29:18 +0100 (BST) |
User-agent: |
Alpine 2.02 (DEB 1266 2009-07-14) |
Hi Laurent,
On Mon, 12 Aug 2013, Laurent De Buyst wrote:
Two weeks ago I sent a mail to this list asking about some strange
delays I was seeing between the EndTime as logged byt rdiff-backup and
the actual end of the running process.
Since then, I've been digging deeper and I'd like to ask once more if
anyone can tell me what's going on and how I might be able to 'fix'
this.
See, I ran my backups using -v9 for a few days and this has shown in
greater detail when the delay happens.
2013/08/12 06:00:56 Mon Aug 12 05:56:03 2013 Copying inc attrs from ... to
...
2013/08/12 06:00:56 Mon Aug 12 05:56:03 2013 Setting time of ... to
1376107515
2013/08/12 06:00:56 Mon Aug 12 05:57:36 2013 Renaming ... to
.../backup/home/domino/data/webadmin.ntf
...
2013/08/12 06:00:56 Mon Aug 12 06:00:56 2013 Setting time of .../backup/tmp
to 1376245594
2013/08/12 06:00:56 Mon Aug 12 06:00:56 2013 Copying attributes from () to
.../backup
2013/08/12 07:17:03 Mon Aug 12 06:00:56 2013 Setting time of .../backup to
1371199946
2013/08/12 07:17:03 Mon Aug 12 06:01:07 2013 Touching
.../backup/rdiff-backup-data/extended_attributes.2013-08-12T00:53:28+02:00.snapshot
2013/08/12 07:17:03 Mon Aug 12 06:02:41 2013 Touching
.../backup/rdiff-backup-data/access_control_lists.2013-08-12T00:53:28+02:00.snapshot
2013/08/12 07:17:03 Mon Aug 12 06:03:26 2013 Writing mirror_metadata diff
2013/08/12 07:17:03 Mon Aug 12 06:04:00 2013 Deleting
.../backup/rdiff-backup-data/mirror_metadata.2013-08-11T00:55:43+02:00.snapshot.gz
2013/08/12 07:17:03 --------------[ Session statistics ]--------------
[...]
2013/08/12 07:17:03 --------------------------------------------------
2013/08/12 07:17:03 Mon Aug 12 07:17:02 2013 Deleting
.../backup/rdiff-backup-data/current_mirror.2013-08-11T00:55:43+02:00.data
Now, the first timestamp comes from the wrapper I'm using, while the second
ones are from rdiff-backup.
The first thing I've noticed even before is that output from
rdiff-backup seems to come in bursts. You can see that it's doing things
at 05:56 that only get logged at 06:00. This seems to be normal
operational mode, I assume there's a cache involved somewhere.
Depending on how your wrapper runs rdiff-backup, if you are feeding the
results into a pipe (for example, rdiff-backup | wrapper-script) then they
will be buffered by the OS. It's possible to turn that buffering off if
you need to. But I would say, just don't trust the timestamps added by
your wrapper script. The timestamps from rdiff-backup contain the time
when the message was generated.
Bearing that in mind, the delay actually seems to come here:
2013/08/12 07:17:03 Mon Aug 12 06:04:00 2013 Deleting
.../backup/rdiff-backup-data/mirror_metadata.2013-08-11T00:55:43+02:00.snapshot.gz
2013/08/12 07:17:03 --------------[ Session statistics ]--------------
[...]
2013/08/12 07:17:03 --------------------------------------------------
2013/08/12 07:17:03 Mon Aug 12 07:17:02 2013 Deleting
.../backup/rdiff-backup-data/current_mirror.2013-08-11T00:55:43+02:00.data
Since I doubt that the delete operations take a long time (unless you're
running on a weird distributed filesystem, or a very heavily loaded
fileserver, in which case all bets are off), I'd say the delay is in
calculating the session statistics.
I don't know quite what it does, but scanning the code should help you, or
strace()ing the process while it's in this delaying mode. It might be
stat()ing 48,000 files. It shouldn't take hours to do that, but if it's
over a network, with high latency, it might do.
You could run rdiff-backup manually without any pipes, so that you can see
the output as it's generated and know when "deleting snapshot.gz" happens.
After that, it's calculating session statistics. Then attach strace to the
process and see what it's doing and which operations take a long time.
And as long as I don't know exactly what it is, I can't really get to
work on this because I don't know if I need more ram, more CPU power,
more CPU speed, faster disks, ...
You could look at which resource is most heavily used when the server is
running the backup jobs. top can help with that. Is the CPU at 100%? Is it
much less than 100%, but you have a high iowait time? Do you have a lot of
swap usage? Does the swap usage grow while the backups are running? Does
vmstat show more blocks in/out or swap in/out while the backups are
running?
Cheers, Chris.
--
_____ __ _
\ __/ / ,__(_)_ | Chris Wilson <address@hidden> Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer |
\__/_/_/_//_/___/ | We are GNU : free your mind & your software |