|
From: | Alastair Rankine |
Subject: | Re: [rdiff-backup-users] What filename characters does Mac OS X support? |
Date: | Sun, 23 Oct 2005 15:56:53 +1000 |
On 22/10/2005, at 12:35 PM, Ben Escoto wrote:
Ben, I don't know what you mean by "translate [unicode characters] to ascii"? This just isn't possible, but perhaps you mean translate these characters to UTF-8 (ie char * in C)? In which case you should look at the "encode" python string methods, and/or the libiconv C library. However: After some further investigation I'm not entirely sure you need to worry about that table of illegal unicode characters I quoted earlier. I just ran the following experiment: #!/usr/bin/python # -*- coding: utf-8 -*- open( u"é composed char", "w").close() open( u"\u00e9 escaped composed", "w").close() open( u"\u0065\u0301 escaped decomposed", "w").close() This resulted in the é character being successfully inserted into each of the three output filenames. (I'd include output of "ls" here, but it doesn't seem to be unicode aware). So even though U+00E9 is explicitly designated as an illegal character by the filesystem specification, it looks like the OS is silently taking care of the required decomposition into the U+0065, U+0301 sequence on disk. So although it is an issue *on disk* for some unicode characters to be decomposed, in reality it doesn't seem to make any difference - the OS takes care of the correct on-disk representation. Interestingly, the OS seems to be re-composing the decomposed characters when reading them from disk: >>> os.listdir(u".") [u'e\u0301 composed char', u'e\u0301 escaped composed', u'e\u0301 escaped decomposed'] This is not important for rdiff-backup, just an interesting aside. Anyway, it seems that any of the unicode character set is usable in MacOS X filenames.
I'm sorry I still don't get it. If the destination filesystem is case *preserving* (which in this case it is), surely this removes the need for unnecessary quoting? |
[Prev in Thread] | Current Thread | [Next in Thread] |