bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnu-libiconv] Please restore "UTF8" as alias for UTF-8 charset


From: Stuart Caie
Subject: Re: [bug-gnu-libiconv] Please restore "UTF8" as alias for UTF-8 charset
Date: Thu, 10 Jan 2019 22:05:00 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1


On 10/01/2019 19:49, Bruno Haible wrote:
please consider restoring "UTF8" as an alias for "UTF-8"
Declined. "UTF-8" is the only standardized name for this encoding.
Adding new aliases, such as "UTF8", only increases the pressure
on other software to support this alias as well, and until all
software supports the alias, the effect you have is: interoperability
problems.

Therefore the best answer to a request to support new aliases is: NO.
The standards authority for iconv_open() is the Open Group, not IANA. Per the standard, encoding names are implementation-defined (https://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv_open.html) therefore, an implementation can be as helpful, compatible or otherwise as it would like to be.

If you're not willing to create an alias, would you be willing to support Unicode Technical Standard #22, section 1.4? https://www.unicode.org/reports/tr22/tr22-8.html#Charset_Alias_Matching

Like IANA, TR-22 also has no authority over iconv_open(), but I like its rationale: "these rules are in place because in practice implementations are faced with many gratuitous variations in the use and omission of punctuation".

This approach is already taken by the Bionic iconv_open() implementation, and is allowed by the POSIX standard. If you're not willing to write this yourself, would you accept a patch implementing it?

Alternatively, would you consider following the WHATWG encoding standard, https://encoding.spec.whatwg.org/#names-and-labels -- not only do they mandate that web page authors MUST use "utf-8" as the encoding name, because that is the correct name (lowercased), they also mandate that web browsers MUST accept "utf8" as an alias for "utf-8". Looks like the pressure got so bad that all the world's major web browsers agree to accept "utf8".

I would gladly accept it if libiconv's documentation made very clear that "UTF-8" is the standard name for the encoding... but did recognise "utf8" as an alternative name for UTF-8 so as to be compatible with at least nine other iconv() implementations, and all the software written to them without libiconv in mind. I don't think I could successfully argue your vision (that "UTF8" MUST NOT be accepted) to the maintainers of glibc, newlib, uclibc, musl, Bionic, FreeBSD, NetBSD and Cygwin, and thus we will remain at an impasse. I would rather write to you and dietlibc's maintainer and get them to support "UTF8". It 's an acceptable compromise for the entire world wide web. Why is it not an acceptable compromise for libiconv?

which you made HPUX-only with the release of libtool 1.13.
You are misreading the source. GNU libiconv did not support the alias
name "UTF8" (other than on HP-UX) in any release.
I stand corrected.

Regards
Stuart



reply via email to

[Prev in Thread] Current Thread [Next in Thread]