bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gnu-libiconv] [bug #55609] Add support for BOM exceptions as indica


From: Jason Pyeron
Subject: [bug-gnu-libiconv] [bug #55609] Add support for BOM exceptions as indicated in RFC 2781 section 3.3
Date: Wed, 30 Jan 2019 23:07:58 -0500 (EST)
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36

Follow-up Comment #8, bug #55609 (project libiconv):

I do not know how to say this more politely, but you are wrong.

The RFC envisions such a format where the BOM is required, and is on a system
where the encoding is Little Endian.

A simple example using a "digital signature data application" DSDA. The SHA256
sum of a UTF-16 stream in Big Endian is different than that in Little Endian.
It is allowable, per RFC, for the author of a format to REQUIRE a BOM and have
a format REQUIRE Little Endian. This DSDA is the recipient of UTF-16,
UTF-16BE, or UTF-16LE data. It follows the RFC and so does its providers. When
data comes in labeled it calls iconv to convert it to UTF-16. If the input is
not labeled, it assumes it is UTF-16 and verifies that assumption by checking
the first 2 bytes.

But there is a big problem, iconv cannot support this application because it
cannot write the UTF-16 data in Little Endian format as an unlabeled UTF-16
file, violating the DSDA document format which mandates a BOM in UTF-16 text,
thereby requiring the use of the "UTF-16" tag only.

I do not see how following the RFC encourages anyone to not follow the RFC. 

By your logic, iconv should never have comments like

/* Here we accept FFFE/FEFF marks as endianness indicators everywhere
   in the stream, not just at the beginning. (This is contrary to what
   RFC 2781 section 3.2 specifies 


Further by your logic it is not desirable that iconv encourages the receiver
of an UTF-16 encoded text to ignore the BOM and treat the remaining text as
_big_-endian encoded.

I am not asking you to make the patch, I am only asking for the opportunity to
have a patcho honestly considered to allow applications to facilitate internal
processing, that need to convert between internal string representation and
external string representation when they are doing I/O.

Does what I propose violate the RFC?

Does what I propose violate a fundamental design restriction of iconv?

Respectfully,

Jason Pyeron

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?55609>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]