bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnu-libiconv] Cygwin iconv (GNU libiconv 1.13) - possible bug


From: Bruno Haible
Subject: Re: [bug-gnu-libiconv] Cygwin iconv (GNU libiconv 1.13) - possible bug
Date: Thu, 1 Sep 2011 09:34:48 +0200
User-agent: KMail/1.13.6 (Linux/2.6.37.6-0.5-desktop; KDE/4.6.0; x86_64; ; )

Hi,

Foucault, Heather wrote:
> I am trying to convert a file on my windows XP machine running cygwin.
> I need the file to be UTF-16LE. However, iconv does not write the byte
> order marks when specifying the endian-ness.
> 
> $ echo "abc" | iconv -f ASCII -t UTF-16 |  od -c
> 0000000 376 377  \0   a  \0   b  \0   c  \0  \n
> 0000012
> 
> $ echo "abc" | iconv -f ASCII -t UTF-16LE |  od -c
> 0000000   a  \0   b  \0   c  \0  \n  \0
> 0000010
> 
> $ echo "abc" | iconv -f ASCII -t UTF-16BE |  od -c
> 0000000  \0   a  \0   b  \0   c  \0  \n
> 0000010

This is all correct. The result of the first command could also be
0000000 377 376   a  \0   b  \0   c  \0  \n  \0
0000012
but that does not really matter, because the output starts with a byte
order mark.

> why does the standard UTF-16 give the BOM "376 377" , but the other two
> do not ?

When you say UTF-16, the receiver / reader of the file or stream will
not know the byte order. Therefore a byte order is emitted.

When you say UTF-16BE or UTF-16LE, it is understood that you will transmit
this character encoding label to the receiver / reader, and therefore no
byte order mark is needed.

This is all specified in RFC 2781 <http://www.ietf.org/rfc/rfc2781.txt>.

> Do I have to explicitly add it ?

If you know that you want little-endian UTF-16, but you will transmit to
the receiver of the file only the label "UTF-16" and not "UTF-16LE" (or
even no label at all), then you need to add the byte order mark yourself,
to compensate for the loss of information.

In the other cases, you don't need to add anything. It all depends on
what you tell the receiver / reader of the file what is its encoding.

Bruno
-- 
In memoriam Nikolai Bryukhanov <http://en.wikipedia.org/wiki/Nikolai_Bryukhanov>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]