[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[rdiff-backup-users] Memory benchmarks and some questions
From: |
David |
Subject: |
[rdiff-backup-users] Memory benchmarks and some questions |
Date: |
Fri, 9 May 2008 11:34:45 +0200 |
Hi list.
With regards to my previous mails (memory usage problems, possible
hardlink bug issues), I ran some benchmarks, and have some questions
which hopefully someone (hopefully a developer) can answer.
Here are the steps I followed (and which can be re-produced by
interested parties) for the benchmarks:
1) Make temporary directory structure:
mkdir -p /tmp/test_rdiffbackup/other_server_rsync
mkdir -p /tmp/test_rdiffbackup/rsync_tmp_local
mkdir -p /tmp/test_rdiffbackup/bkp_store
Summary of dirs:
- other_server_rsync - represents files on another server, only
accessible by rsync login
- rsync_tmp_local - temporary staging area, where we rsync files into
before running rdiff-backup
- bkp_store - local rdiff-backup store (with a rdiff-backup-data
directory, etc)
2) Make 10,000 temporary files on the 'other sever'
# First declare a function, because we run this logic a few times
make_files() {
i=0
while [ $i -lt 10000 ]; do
mktemp -p /tmp/test_rdiffbackup/other_server_rsync
let i++
done
}
# Now call the function
make_files
3) Rsync them from 'remote server' to rsync staging area
rsync -va /tmp/test_rdiffbackup/other_server_rsync/
/tmp/test_rdiffbackup/rsync_tmp_local/
4) Start (in another terminal) a command to measure rdiff-backup memory usage:
top -b -d 0.1 | grep rdiff-backup
5) Run rdiff-backup:
rdiff-backup /tmp/test_rdiffbackup/rsync_tmp_local
/tmp/test_rdiffbackup/bkp_store
# Last 'top' line has this memory usage: 15988 11m 2676 (VIRT/RES/SHR)
6) Re-run 'server update & backup' cycle a few times, to get some
memory statistics for normal usage:
update_bkp() {
make_files
rsync -va --delete /tmp/test_rdiffbackup/other_server_rsync/
/tmp/test_rdiffbackup/rsync_tmp_local/
rdiff-backup /tmp/test_rdiffbackup/rsync_tmp_local
/tmp/test_rdiffbackup/bkp_store
}
update_bkp # 20240 15m 2732 (VIRT/RES/SHR)
update_bkp # 20396 16m 2732 - Increased by 156kb
update_bkp # 21176 16m 2732 - Increased by 780kb
update_bkp # 21276 16m 2732 - Increased by 100kb
update_bkp # 21612 17m 2732 - Increased by 336kb
So, it looks like there is a (rougly) linear increase in memory usage
when the number of files increases, between 100 and 780 kb per 10,000
files.
A few questions at this point:
a) Is this normal?
b) Won't this cause problems when backing up huge (millions+) files to
a memory-limited backup server?
c) Is it possible for rdiff-backup to use Python generator functions
instead of lists, to keep memory usage down?
7) Next test - using hardlinks to limit disk usage on the backup server.
If the /tmp/test_rdiffbackup/bkp_store directory is massive, then we
don't want to use up the same amount of space under
/tmp/test_rdiffbackup/rsync_tmp_local. The most obvious solution to
this is to use hardlinks between unchanged files.
So, let's update our 'update_bkp()' function:
update_bkp() {
mkdir -p /tmp/test_rdiffbackup/rsync_tmp_local
rsync -va --link-dest=/tmp/test_rdiffbackup/bkp_store/
/tmp/test_rdiffbackup/bkp_store/
/tmp/test_rdiffbackup/rsync_tmp_local/ --exclude=/rdiff-backup-data
make_files
rsync -va --delete /tmp/test_rdiffbackup/other_server_rsync/
/tmp/test_rdiffbackup/rsync_tmp_local/
rdiff-backup /tmp/test_rdiffbackup/rsync_tmp_local
/tmp/test_rdiffbackup/bkp_store
rm -rf /tmp/test_rdiffbackup/rsync_tmp_local
}
8) Re-run the test function a few times, and gather stats from the
other terminal:
# First, clear out the directories, to speed up rdiff-backup for these
tests (otherwise it takes a *long* time to finish)
rm -rvf /tmp/test_rdiffbackup/other_server_rsync /tmp/test_rdiffbackup/bkp_store
mkdir /tmp/test_rdiffbackup/other_server_rsync /tmp/test_rdiffbackup/bkp_store
# Next, run the tests and monitor memory usage:
update_bkp # 15972 11m 2704 (VIRT/RES/SHR)
update_bkp # 33748 29m 2760 - Increased by 17,776kb (First backup
where history already existed)
update_bkp # 34524 30m 2760 - Increased by 700kb
update_bkp # 36036 31m 2760 - Increased by 1,512kb
update_bkp # 42336 37m 2760 - Increased by 6,300kb (Also, took a lot
longer to run than the previous backups)
update_bkp # 45896 41m 2760 - Increased by 3,560kb
>From the above stats it looks like rdiff-backup uses an extra 1-6 MB
per 10,000 files when hardlinks are involved.
Some questions at this point:
d) Is an extra 3-6 MB really needed per extra 10,000 files (with
hardlinks)? That's 300-600 bytes per file. Aren't there more efficient
structures that can be used? How does rsync handle hard-link
preservation?
e) Does rdiff-backup really need to use it's hardlink-handling logic
in this case? None of the files under the store are hardlinked to each
other. I assume this is happening because the hardlink count per file
is greater than 1
f) Why does rdiff-backup go so much slower when hardlinks are involved?
g) Could rdiff-backup get a new option (eg: --min-hard-link-count)
which sets when hardlink logic will activate? The default will be 2,
but for cases like this users could use 3 instead.
David.
- [rdiff-backup-users] Memory benchmarks and some questions,
David <=