bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnu-libiconv] Please restore "UTF8" as alias for UTF-8 charset


From: Bruno Haible
Subject: Re: [bug-gnu-libiconv] Please restore "UTF8" as alias for UTF-8 charset
Date: Sat, 12 Jan 2019 10:54:49 +0100
User-agent: KMail/5.1.3 (Linux/4.4.0-141-generic; KDE/5.18.0; x86_64; ; )

Hi Stuart,

> I don't think I could successfully 
> argue your vision (that "UTF8" MUST NOT be accepted) to the maintainers 
> of glibc, newlib, uclibc, musl, Bionic, FreeBSD, NetBSD and Cygwin

I did not say that other software, that already supports "UTF8", must
stop supporting it. That would cause backward compatibility problems.

I did say that the best answer to requests to support non-standard aliases
is to say NO. Such requests cause interoperability problems regarding the
use of that alias. And then, when finally after 5 or 10 years, all
software has been upgraded to support the alias and thus close the
interoperability problems, the same game starts again with another alias
(for the same or for another encoding).

> The standards authority for iconv_open() is the Open Group, not IANA. 
> Per the standard, encoding names are implementation-defined 
> (https://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv_open.html) 
> therefore, an implementation can be as helpful, compatible or otherwise 
> as it would like to be.

The Open Group is stating that they are not standardizing the encoding
names supported by iconv_open. The one and only standard in this area
is thus IANA.

> If you're not willing to create an alias, would you be willing to 
> support Unicode Technical Standard #22, section 1.4? 
> https://www.unicode.org/reports/tr22/tr22-8.html#Charset_Alias_Matching

The "ignore case" rule is certainly good. The other rules in this section,
however, make a software not future-proof: If a software decides that
it should treat "Latin-1" like "Latin1", and later an encoding or alias
named "Latin-1" actually gets introduced, you have a problem.

I decided to make libiconv future-proof.

> Alternatively, would you consider following the WHATWG encoding 
> standard, https://encoding.spec.whatwg.org/#names-and-labels -- not only 
> do they mandate that web page authors MUST use "utf-8" as the encoding 
> name, because that is the correct name (lowercased), they also mandate 
> that web browsers MUST accept "utf8" as an alias for "utf-8".

The WHATWG spec is meant for web pages and web browsers. It has no
immediate force on iconv_open, since iconv's primary use is not for
web pages.

> Looks like 
> the pressure got so bad that all the world's major web browsers agree to 
> accept "utf8".

Yes, I agree, for web pages it surely makes sense.

> I would gladly accept it if libiconv's documentation made very clear 
> that "UTF-8" is the standard name for the encoding

The 'iconv -l' output makes it clear:
$ iconv -l | grep -i utf
UTF-8
UTF-16
UTF-16BE
UTF-16LE
UTF-32
UTF-32BE
UTF-32LE
UNICODE-1-1-UTF-7 UTF-7 CSUNICODE11UTF7

And the documentation as well:
https://www.gnu.org/software/libiconv/

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]