Re: [bug-gnu-libiconv] iconv issue

bug-gnu-libiconv

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnu-libiconv] iconv issue

From:	Bruno Haible
Subject:	Re: [bug-gnu-libiconv] iconv issue
Date:	Sat, 01 Oct 2016 17:23:45 +0200
User-agent:	KMail/4.8.5 (Linux/3.8.0-44-generic; KDE/4.8.5; x86_64; ; )

Hi,
Hi,

Kenneth Nellis wrote on 2016-06-10:
> $ file f
> f: exported SGML document, UTF-8 Unicode (with BOM) text, with CRLF line 
> terminators
> ...
> Accordingly, it seems strange, perhaps a bug?, that the former of the 
> following two lines doesn't work, but the latter does:
> 
> $ cat f | iconv -f UTF-8 -t Latin1 > x
> iconv: (stdin):1:0: cannot convert
> $ cat f | iconv -f UTF-8 -t UTF-16 | iconv -f UTF-16 -t Latin1 > x
> $

The output of the 'file f' command shows that the contents of f starts with a
U+FEFF character. According to RFC 3629 [1] section 6:

  "It is therefore RECOMMENDED to avoid stripping an initial
   U+FEFF interpreted as a signature without a good reason, to ignore it
   instead of stripping it when appropriate (such as for display) and to
   strip it only when really necessary."

It is therefore OK that iconv does not strip away the leading U+FEFF character.

The seconds line succeeds because the 'iconv -f UTF-8 -t UTF-16' command
leaves the U+FEFF character in place and the 'iconv -f UTF-16 ...' command
then strips it away. This is because UTF-16 handles the byte-order mark.

Yes, I know such BOMs frequently occur in XML files written by Windows tools,
because some Windows developers have/had the mindset that a BOM was a good
thing. When in fact it is a bad thing (in the case of UTF-8).

Bruno

[1] https://tools.ietf.org/html/rfc3629

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [bug-gnu-libiconv] iconv issue, Bruno Haible <=

Prev by Date: Re: [bug-gnu-libiconv] libiconv 1.14 fails to build in gcc 4.9.2
Next by Date: Re: [bug-gnu-libiconv] Cannot assume that "gets" is declared
Previous by thread: Re: [bug-gnu-libiconv] libiconv 1.14 fails to build in gcc 4.9.2
Next by thread: Re: [bug-gnu-libiconv] Cannot assume that "gets" is declared
Index(es):
- Date
- Thread