|
From: | Elf |
Subject: | Re: [Chicken-users] BOM in a Scheme source file |
Date: | Sat, 8 Sep 2007 23:54:55 -0700 (PDT) |
and according to the unicode consortium: A: Yes, UTF-8 can contain a BOM. However, it makes no difference as to the endianness of the byte stream. UTF-8 always has the same byte order. An initial BOM is only used as a signature -- an indication that an otherwise unmarked text file is in UTF-8. Note that some recipients of UTF-8 encoded data do not expect a BOM. Where UTF-8 is used transparently in 8-bit environments, the use of a BOM will interfere with any protocol or file format that expects specific ASCII characters at the beginning, such as the use of "#!" of at the beginning of Unix shell scripts. [AF] & [MD] and In the absence of a protocol supporting its use as a BOM and when not at the beginning of a text stream, U+FEFF should normally not occur. and 3. Some byte oriented protocols expect ASCII characters at the beginning of a file. If UTF-8 is used with these protocols, use of the BOM as encoding form signature should be avoided. 4. Where the precise type of the data stream is known (e.g. Unicode big-endian or Unicode little-endian), the BOM should not be used. In particular, whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used. See also [ why not fix scite to not put in chars it shouldnt? -elf On Sun, 9 Sep 2007, Pierpaolo Bernardi wrote:
On 9/9/07, Graham Fawcett <address@hidden> wrote:On 9/8/07, Pierpaolo Bernardi <address@hidden> wrote:UTF8 has no BOM. A BOM in a utf8 file should be there only if you put it there.Not true. http://en.wikipedia.org/wiki/Byte_Order_MarkUTF8 is defined by the Unicode consortium, not by wikipedia. See here for example: http://unicode.org/faq/utf_bom.html#29 which says that you can put a bom in a utf8 file (of course, you can put whatever character you want in a file), but it is a character like every other character, it has no particular meaning wrt the encoding. Then, maybe chicken could consider U+FFFE as whitespace, to work around this bug in scite, and maybe other broken tools. P. On 9/9/07, Graham Fawcett <address@hidden> wrote:On 9/8/07, Pierpaolo Bernardi <address@hidden> wrote:UTF8 has no BOM. A BOM in a utf8 file should be there only if you put it there.Not true. http://en.wikipedia.org/wiki/Byte_Order_Mark G_______________________________________________ Chicken-users mailing list address@hidden http://lists.nongnu.org/mailman/listinfo/chicken-users
[Prev in Thread] | Current Thread | [Next in Thread] |