Re: coding tags and utf-16

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: coding tags and utf-16

From:	Benjamin Riefenstahl
Subject:	Re: coding tags and utf-16
Date:	Mon, 06 Mar 2006 20:35:15 +0100
User-agent:	Gnus/5.1001 (Gnus v5.10.1) Emacs/21.3.50 (gnu/linux)

Hi,

Kenichi Handa writes:
> For decoding UTF-8, we should not delete that BOM but treat it as
> the content of the text.  For UTF-16, Unicode explicitly says that
> "The BOM is not considered part of the content of the text", but for
> UTF-8, it doesn't say such a thing.

NOTEPAD.EXE (the basic MS Windows editor) adds a BOM when writing
UTF-8 files.  When I saw that and tried to discuss it on their
newsgroups, I learned that it seems to be Microsoft's POV that this is
a good thing.

Which means files like that exist.  Treating the BOM as content means
that U+FEFF creeps into the regular content of documents through
cut-and-paste and through components of template systems.  I have
already seen that happening in real life and of course it leads to
stupid bugs.  I think Emacs should do better.

> utf-16-be [==] utf-16be-with-signature [!=] utf-16be

;-)

benny

[Prev in Thread]

Current Thread

[Next in Thread]

Re: coding tags and utf-16, Kenichi Handa, 2006/03/02
- Re: coding tags and utf-16, Benjamin Riefenstahl, 2006/03/04
  - Re: coding tags and utf-16, Kenichi Handa, 2006/03/06
    - Re: coding tags and utf-16, Benjamin Riefenstahl <=
    - Re: coding tags and utf-16, Kenichi Handa, 2006/03/06
  - Re: coding tags and utf-16, Tomas Zerolo, 2006/03/08
- Re: coding tags and utf-16, Kenichi Handa, 2006/03/15

Prev by Date: Re: Was: Editing .odt in emacs
Next by Date: Cannot bootstrap on NetBSD/cobalt
Previous by thread: Re: coding tags and utf-16
Next by thread: Re: coding tags and utf-16
Index(es):
- Date
- Thread