chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] BOM in a Scheme source file


From: Elf
Subject: Re: [Chicken-users] BOM in a Scheme source file
Date: Sat, 8 Sep 2007 23:54:55 -0700 (PDT)


and according to the unicode consortium:

 A: Yes, UTF-8 can contain a BOM. However, it makes no difference as
     to the endianness of the byte stream. UTF-8 always has the same
     byte order. An initial BOM is only used as a signature -- an
     indication that an otherwise unmarked text file is in UTF-8. Note
     that some recipients of UTF-8 encoded data do not expect a BOM.
     Where UTF-8 is used transparently in 8-bit environments, the use of
     a BOM will interfere with any protocol or file format that expects
     specific ASCII characters at the beginning, such as the use of "#!"
     of at the beginning of Unix shell scripts. [AF] & [MD]

and
    In the absence of a protocol supporting its use as a BOM and
     when not at the beginning of a text stream, U+FEFF should normally
     not occur.
and
     3. Some byte oriented protocols expect ASCII characters at the             
        beginning of a file. If UTF-8 is used with these protocols, use of
        the BOM as encoding form signature should be avoided.
     4. Where the precise type of the data stream is known (e.g. Unicode
       big-endian or Unicode little-endian), the BOM should not be used.
       In particular, whenever a data stream is declared to be UTF-16BE,
       UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used. See also [

why not fix scite to not put in chars it shouldnt?

-elf


On Sun, 9 Sep 2007, Pierpaolo Bernardi wrote:

On 9/9/07, Graham Fawcett <address@hidden> wrote:
On 9/8/07, Pierpaolo Bernardi <address@hidden> wrote:
UTF8 has no BOM.  A BOM in a utf8 file should be there only if you
put it there.

Not true.

http://en.wikipedia.org/wiki/Byte_Order_Mark

UTF8 is defined by the Unicode consortium, not by wikipedia.

See here for example: http://unicode.org/faq/utf_bom.html#29

which says that you can put a bom in a utf8 file (of course, you can
put whatever character you want in a file), but it is a character
like every other character, it has no particular meaning wrt the encoding.

Then, maybe chicken could consider U+FFFE as whitespace, to work
around this bug in scite, and maybe other broken tools.

P.




On 9/9/07, Graham Fawcett <address@hidden> wrote:
On 9/8/07, Pierpaolo Bernardi <address@hidden> wrote:
UTF8 has no BOM.  A BOM in a utf8 file should be there only if you
put it there.

Not true.

http://en.wikipedia.org/wiki/Byte_Order_Mark

G



_______________________________________________
Chicken-users mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/chicken-users





reply via email to

[Prev in Thread] Current Thread [Next in Thread]