[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Filename encoding
From: |
Chris Vine |
Subject: |
Re: Filename encoding |
Date: |
Wed, 15 Jan 2014 21:42:57 +0000 |
On Wed, 15 Jan 2014 23:00:18 +0200
Eli Zaretskii <address@hidden> wrote:
> > Date: Wed, 15 Jan 2014 19:50:51 +0000
> > From: Chris Vine <address@hidden>
> > Cc: address@hidden
> >
> > POSIX system calls are encoding agnostic. The filename is just a
> > series of bytes terminating with a NUL character. All guile needs
> > to know is what encoding the person creating the filesystem has
> > adopted in naming files and which it needs to map to.
>
> This doesn't work well, because you cannot easily take apart and
> construct file names in encoding-agnostic ways. For example, some
> multibyte sequence in an arbitrary encoding could include the '/' or
> '\' characters, so searching for directory separators could fail,
> unless you use multibyte-aware string functions (which is a nuisance,
> because these functions only support a single locale at a time).
>
> So I think using UTF-8 internally is a much better way.
I am not sure what you mean, as I am not talking about internal use.
Guile uses IS0-5598-1 and UTF-32 internally for all its strings, which
is fine. glib uses UTF-32 and UTF-8 internally for most purposes. It
is the external representation which is in issue. This is just an
encoding transformation for the library when looking up a file (be it
guile, glib or anything else).
As it happens (although this is beside the point) using a byte value or
sequence in a filename which the operating system reserves as the '/'
character, for a purpose other than designating a pathname, or a NUL
character for designating anything other than end of filename, is not
POSIX compliant and will not work on any operating system I know of,
including windows. (As for POSIX, see SUS, Base Definitions, section
3.170 (Filename) and 3.267 (Pathname).) But as I say, that is
irrelevant. Whatever the filesystem encoding happens to be, it happens
to be. It might not be a narrow encoding at all.
Chris
- Filename encoding, Chris Vine, 2014/01/15
- Re: Filename encoding, Mark H Weaver, 2014/01/15
- Re: Filename encoding, Chris Vine, 2014/01/15
- Re: Filename encoding, Eli Zaretskii, 2014/01/15
- Re: Filename encoding, Ludovic Courtès, 2014/01/15
- Re: Filename encoding, Eli Zaretskii, 2014/01/15
- Re: Filename encoding, Ludovic Courtès, 2014/01/16
- Re: Filename encoding, John Darrington, 2014/01/16
- Re: Filename encoding, Eli Zaretskii, 2014/01/16
- Re: Filename encoding, Eli Zaretskii, 2014/01/16