[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[rdiff-backup-users] known issues? trailing spaces on vfat, hyphen as ra
From: |
Marcel Cary |
Subject: |
[rdiff-backup-users] known issues? trailing spaces on vfat, hyphen as range in regex, long filenames |
Date: |
Fri, 1 Jun 2007 09:09:39 -0700 (PDT) |
I'm trying to get rdiff-backup to replace my tar | gzip | split backup
system that does full backups every time. So far I've found these issues:
1. rdiff-backup seems to choke on files with trailing spaces when backing
up to a vfat filesystem
2. There appears to be a bug in the regex for quoting characters that
causes '*' to not be quoted when it should
3. rdiff-backup turns some ~140 character filenames into ~260 filenames
on vfat
I've seen that (3) is a known issue (bug 12823, fixed in CVS late 2005),
but I would have expected that fix to make it into my version 1.0.4
(released early 2006). But I'm still having long filename issues.
Have other folks seen (1) and (2) behaviors? Are they known issues? I
don't see them in the Savannah bug database. I'd be happy to file bugs if
that would be helpful. Or perhaps I should be installing the latest
stable rdiff-backup from source to avoid these issues.
The details:
My backup disk is currently formated with a FAT filesystem, which appears
to disallow filenames with trailing spaces.
$ sudo touch 'foo '
touch: setting times of `foo ': No such file or directory
$ sudo mkdir 'foo '
mkdir: cannot create directory `foo ': Invalid argument
$ mount
...
/dev/sda1 on /media/usbdisk type vfat
(rw,nosuid,nodev,noatime,flush,uid=1001,utf8,shortname=lower)
rdiff-backup chokes on a file with trailing spaces like this:
Traceback (most recent call last):
File "/usr/bin/rdiff-backup", line 23, in ?
rdiff_backup.Main.Main(sys.argv[1:])
File "/usr/lib/python2.4/site-packages/rdiff_backup/Main.py", line 285, in
Main
take_action(rps)
File "/usr/lib/python2.4/site-packages/rdiff_backup/Main.py", line 255, in
take_action
elif action == "backup": Backup(rps[0], rps[1])
File "/usr/lib/python2.4/site-packages/rdiff_backup/Main.py", line 308, in
Backup
backup.Mirror(rpin, rpout)
File "/usr/lib/python2.4/site-packages/rdiff_backup/backup.py", line 38, in
Mirror
DestS.patch(dest_rpath, source_diffiter)
File "/usr/lib/python2.4/site-packages/rdiff_backup/backup.py", line 218,
in patch
ITR(diff.index, diff)
File "/usr/lib/python2.4/site-packages/rdiff_backup/rorpiter.py", line 288,
in __call__
branch.start_process(*args)
File "/usr/lib/python2.4/site-packages/rdiff_backup/backup.py", line 548,
in start_process
if diff_rorp.isdir(): self.prepare_dir(diff_rorp, base_rp)
File "/usr/lib/python2.4/site-packages/rdiff_backup/backup.py", line 574,
in prepare_dir
base_rp.mkdir()
File "/usr/lib/python2.4/site-packages/rdiff_backup/rpath.py", line 796, in
mkdir
self.conn.os.mkdir(self.path)
OSError: [Errno 22] Invalid argument:
'/media/usbdisk/filesystem_backup/rdiff-backup/home/somebody/;068ocuments/;069xtended
;070amily & ;067ommunity/;071ifts ;082eceived ;076ists '
Exception exceptions.TypeError: "'NoneType' object is not callable" in <bound method
GzipFile.__del__ of <gzip open file
'/media/usbdisk/filesystem_backup/rdiff-backup/rdiff-backup-data/file_statistics.2007-05-30;08407;05815;05817-07;05800.data.gz',
mode 'wb' at 0xb7b35020 -0x486ef474>> ignored
Exception exceptions.TypeError: "'NoneType' object is not callable" in <bound method
GzipFile.__del__ of <gzip open file
'/media/usbdisk/filesystem_backup/rdiff-backup/rdiff-backup-data/error_log.2007-05-30;08407;05815;05817-07;05800.data.gz',
mode 'wb' at 0xb7baff98 -0x486ef454>> ignored
Exception exceptions.TypeError: "'NoneType' object is not callable" in <bound method
GzipFile.__del__ of <gzip open file
'/media/usbdisk/filesystem_backup/rdiff-backup/rdiff-backup-data/mirror_metadata.2007-05-30;08407;05815;05817-07;05800.snapshot.gz',
mode 'wb' at 0xb7b35068 -0x486ef2b4>> ignored
It turns out that in this particular case, I can work around the problem
by asking the owner of that directory to rename it; it was certainly an
error to name the file that way.
In spite of the easy work-around, it might be worth fixing. I'd guess
that escaping a trailing space would work, but might be difficult without
escaping *all* spaces, which would make the transformed filenames uglier
than necessary. (Note that a leading or interior space is no problem for
vfat.) I'm sure there are other solutions as well.
Now for the regex error... after working around the previous issue, I get
another stack trace on a file named
/usr/share/guile/1.6/ice-9/and-let*.scm. Note the asterisk in the
filename. Also, in the output from rdiff-backup:
Characters needing quoting '^a-z0-9_ -.'
Note that the last dash is between a space and a period, which, in a
regex character class, means from ord(" ") to ord("."). Asterisk falls
within this range. A fix is to move the dash last in the character class
to that it will not be interpretted as a range.
import re
r = re.compile("[^a-z0-9_ -.]|;")
r.sub("+", "and-let*.scm.2007-05-30T16:15:53-07:00.missing")
'and-let*.scm.2007-05-30+16+15+53-07+00.missing' <-- asterisk not replaced
r = re.compile("[^a-z0-9_ .-]|;")
r.sub("+", "and-let*.scm.2007-05-30T16:15:53-07:00.missing")
'and-let+.scm.2007-05-30+16+15+53-07+00.missing'
And finally, the long filename issue. KDE's RSS reader Akregator stores
feed data in files named by their URLs. For example:
~/.kde/share/apps/akregator/Archive/http___akregator.sf.net_rss2.php.mk4
But some URLs are longer. And feeds for search results have lots of '&'
and '='. Between the escaping of those characters, the timestamp suffix
added by rdiff-backup, and the escaping of the characters in the
timestamp, several feeds with ~140 character URLs get transformed into
~260 character increment files. And with that, rdiff-backup fails.
I ran across a link to duplicity, which, asside from encoding backups in a
tar-style archive, appends metadata with a prepended directory. Maybe
this would alleviate some of the long filename pain? For example, the
increment for the above files could be named:
2007-05-30;08419;05851;05805-07;05800/http___akregator.sf.net_rss2.php.mk4.diff.gz
Note that many of the increments in the same directory share the same
timestamp, so I would generally expect several diffs in each timestamp
directory. The .diff.gz could also potentially be handled this way. Or
maybe by generating random names as suggested in bug 12823 if that's done
and working.
I'm currently working around the issue by not backing up those files:
--exclude-regexp '/home/[^/]+/\.kde/share/apps/akregator/Archive/[^/]{53,}'
I'm not ready to drop my old tarball strategy as long as rdiff-backup is
skipping those files. Scarier still is that after adding the exclude
regexp, I had to remove the entire backup repository to get a successful
backup. I think that's because the first pass at some of those files did
not require increments which append the 47 character (when escaped)
timestamp suffix, and perhaps when excluding the files it was necessary to
record their removal from the repository by creating an increment.
I'm thrilled to see that in spite of vfat's limitations, rdiff-backup can
keep track of unix-type details like owner, group, permissions, and
special files. I choose vfat so I can have (limited) access to backup
data before I'm able to reinstall my Linux server after a failure.
I'm ready to get rid of all but the most recent tarball from my old
strategy, and hopefully soon I can ditch the tarball strategy all
together.
$ uname -a
Linux foo 2.6.16.13-4-default #1 Wed May 3 04:53:23 UTC 2006 i686 i686 i386
GNU/Linux
$ python -V
Python 2.4.2
$ rdiff-backup --version
rdiff-backup 1.0.4
Marcel
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [rdiff-backup-users] known issues? trailing spaces on vfat, hyphen as range in regex, long filenames,
Marcel Cary <=