[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Emacs Lisp's future
From: |
Eli Zaretskii |
Subject: |
Re: Emacs Lisp's future |
Date: |
Mon, 06 Oct 2014 18:08:29 +0300 |
> From: Mark H Weaver <address@hidden>
> Cc: address@hidden, address@hidden, address@hidden, address@hidden,
> address@hidden, address@hidden, address@hidden
> Date: Mon, 06 Oct 2014 02:21:41 -0400
>
> A related problem has to do with the fact that naively implemented UTF-8
> allows code points to be represented with more bytes than are actually
> needed, essentially by padding the code point with leading zeroes and
> then encoding with UTF-8 as if the high bits were non-zero. For
> example, the ASCII quote (") can be represented as the single byte 0x22,
> the two byte sequence 0xC0 0xA2, etc.
>
> UTF-8 decoders are supposed to detect and reject these "overlong"
> encodings, but it is likely that many programs fail to do this. Such
> programs are usually vulnerable to these overlong encodings when trying
> to detect special characters (e.g. for quoting/escaping) or when
> validating inputs.
>
> To cope with this, the Unicode standards require that UTF-8 codecs
> reject overlong encodings and other invalid byte sequences. This is in
> direct conflict with the idea of "raw byte" code points, whose purpose
> is to be tolerant of arbitrary byte sequences and to propagate them
> unchanged.
The obvious solution is to encode the raw bytes internally in a UTF-8
compatible way. Which is what Emacs does in its buffers and strings,
as I'm sure you know. Can't Guile do something similar?
> FWIW, I agree that the Emacs behavior is desirable when editing a file
> that may contain coding errors, but in most other cases (e.g. when
> communicating with processes or network sockets) I think that it's more
> appropriate to refuse to accept, produce, or propagate invalid UTF-8
> such as overlong encodings.
Emacs indeed rejects them, but that doesn't mean it disallows raw
bytes as part of otherwise valid UTF-8 content. It's a fact of life
that such stray bytes sometimes happen, and users would be generally
unhappy if Emacs would reject a file because it had such bytes.
- Re: Emacs Lisp's future, (continued)
- Re: Emacs Lisp's future, Stephen J. Turnbull, 2014/10/05
- Re: Emacs Lisp's future, Richard Stallman, 2014/10/06
- Re: Emacs Lisp's future, Stephen J. Turnbull, 2014/10/06
- Re: Emacs Lisp's future, Richard Stallman, 2014/10/07
- Re: Emacs Lisp's future, Stephen J. Turnbull, 2014/10/07
- Re: Emacs Lisp's future, David Kastrup, 2014/10/07
- Re: Emacs Lisp's future, Stephen J. Turnbull, 2014/10/07
- Re: Emacs Lisp's future, David Kastrup, 2014/10/07
- Re: Emacs Lisp's future, Thien-Thi Nguyen, 2014/10/10
Re: Emacs Lisp's future, Mark H Weaver, 2014/10/06
- Re: Emacs Lisp's future,
Eli Zaretskii <=
- Re: Emacs Lisp's future, David Kastrup, 2014/10/06
- Re: Emacs Lisp's future, Eli Zaretskii, 2014/10/06
- Re: Emacs Lisp's future, Stephen J. Turnbull, 2014/10/06
- Re: Emacs Lisp's future, David Kastrup, 2014/10/06
- Re: Emacs Lisp's future, Stephen J. Turnbull, 2014/10/06
Re: Emacs Lisp's future, Richard Stallman, 2014/10/07
Re: Emacs Lisp's future, Eli Zaretskii, 2014/10/07
Re: Emacs Lisp's future, David Kastrup, 2014/10/06
Re: Emacs Lisp's future, Mark H Weaver, 2014/10/06
Re: Emacs Lisp's future, Eli Zaretskii, 2014/10/06