[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: utf-16le vs utf-16-le
From: |
Stephen J. Turnbull |
Subject: |
Re: utf-16le vs utf-16-le |
Date: |
Wed, 16 Apr 2008 01:51:50 +0900 |
Eli Zaretskii writes:
> > BOM-{prohibited,auto,required}.
>
> But we don't have these in Emacs, do we?
Huh? We don't have the full suite, but we do have -signature variants.
> > > Don't forget that en/decoding is used on strings as well, not only on
> > > buffers. Buffer-local variables won't cut it, I think.
> >
> > Strings don't have encoding signatures or newline variants
>
> ??? Of course, they do.
Indeed? Suppose I have a string as the value of the symbol `s'
containing the octets "\r\n". Please explain to me how to compute
whether that is the value 0x0D0A from a network stream prepared using
htons(3), or a line ending suitable for appending to a Windows file.
As I wrote before:
> > those octet sequences if present in a string are merely binary octet
> > sequences. They only have special semantics in external
> > representations. Where's the problem?
>
> A string can be sent to a process, for example, so we must have some
> way of generating an external representation for it.
Well, of course we must. But the right generalization of "buffer file
coding system" is not to apply en/decoding to strings, but rather to
give processes and sockets, etc, coding system properties equivalent
to my proposed buffer-local variables.
All I'm trying to say here is that "prepend a signature" and
"translate ?\n to appropriate EOL representation" and their inverses
make sense independently of the text encoding[1], and that the user
interface and API could be greatly clarified if it reflected that
fact. I suspect bugs like the one you encountered would be a lot less
frequent if the internal architecture reflected it too, but that might
be inefficient.
Footnotes:
[1] Obviously "prepend a signature" needs to be parametrized by the
encoding in general, but in the case of Unicode UTFs it's actually
independent of the UTF.
- utf-16le vs utf-16-le, (continued)
- utf-16le vs utf-16-le, Stephen J. Turnbull, 2008/04/13
- Re: utf-16le vs utf-16-le, Eli Zaretskii, 2008/04/13
- Re: utf-16le vs utf-16-le, Stephen J. Turnbull, 2008/04/14
- Re: utf-16le vs utf-16-le, David Kastrup, 2008/04/14
- Re: utf-16le vs utf-16-le, Stephen J. Turnbull, 2008/04/14
- Re: utf-16le vs utf-16-le, Eli Zaretskii, 2008/04/14
- Re: utf-16le vs utf-16-le, Stephen J. Turnbull, 2008/04/14
- Re: utf-16le vs utf-16-le, Andreas Schwab, 2008/04/14
- Re: utf-16le vs utf-16-le, Stephen J. Turnbull, 2008/04/14
- Re: utf-16le vs utf-16-le, Eli Zaretskii, 2008/04/14
- Re: utf-16le vs utf-16-le,
Stephen J. Turnbull <=
- Re: utf-16le vs utf-16-le, Eli Zaretskii, 2008/04/15
- Re: utf-16le vs utf-16-le, Eli Zaretskii, 2008/04/15
- Re: utf-16le vs utf-16-le, David Kastrup, 2008/04/15
- Re: utf-16le vs utf-16-le, Stephen J. Turnbull, 2008/04/16
- Re: utf-16le vs utf-16-le, David Kastrup, 2008/04/16
- Re: utf-16le vs utf-16-le, Stephen J. Turnbull, 2008/04/16
- Re: utf-16le vs utf-16-le, Eli Zaretskii, 2008/04/16
- Re: utf-16le vs utf-16-le, Stephen J. Turnbull, 2008/04/17
- Re: utf-16le vs utf-16-le, Jan Djärv, 2008/04/17
- Re: utf-16le vs utf-16-le, Eli Zaretskii, 2008/04/17