[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: coding tags and utf-16
From: |
Benjamin Riefenstahl |
Subject: |
Re: coding tags and utf-16 |
Date: |
Mon, 06 Mar 2006 20:35:15 +0100 |
User-agent: |
Gnus/5.1001 (Gnus v5.10.1) Emacs/21.3.50 (gnu/linux) |
Hi,
Kenichi Handa writes:
> For decoding UTF-8, we should not delete that BOM but treat it as
> the content of the text. For UTF-16, Unicode explicitly says that
> "The BOM is not considered part of the content of the text", but for
> UTF-8, it doesn't say such a thing.
NOTEPAD.EXE (the basic MS Windows editor) adds a BOM when writing
UTF-8 files. When I saw that and tried to discuss it on their
newsgroups, I learned that it seems to be Microsoft's POV that this is
a good thing.
Which means files like that exist. Treating the BOM as content means
that U+FEFF creeps into the regular content of documents through
cut-and-paste and through components of template systems. I have
already seen that happening in real life and of course it leads to
stupid bugs. I think Emacs should do better.
> utf-16-be [==] utf-16be-with-signature [!=] utf-16be
;-)
benny