bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnu-libiconv] Big5-UAO


From: aka godfat 真常
Subject: Re: [bug-gnu-libiconv] Big5-UAO
Date: Sat, 27 Nov 2010 02:02:03 +0800

Hi,

2010/11/24 oCameLo <address@hidden>:
> ---------- Forwarded message ----------
> From: Bruno Haible <address@hidden>
> Date: 2010/11/24
> Subject: Re: [bug-gnu-libiconv] Big5-UAO
> To: address@hidden
> 抄送: oCameLo <address@hidden>
> oCameLo wrote:
>> Also, would you please consider to add Big5-UAO to libiconv (not in
>> extra encodings)?
>>
>> Big5-UAO's not an official standard, but it's the most used Big5
>> except CP950 in Taiwan.
>>
>> Ruby has done for it: http://redmine.ruby-lang.org/issues/show/1784
>
> Can you please show evidence that this encoding is wide-spread in
> Taiwan? A Google search for
>   "big5-uao" -ruby
> does not turn up much information that I could understand. When you
> compare the hit counts
>   "big5-uao"              -> 1950
>   "big5-uao" -ruby        ->  322
> it seems this encoding is nearly only known among Ruby programmers
> and not elsewhere.

I think what I would like to say is already in Ruby's issue
tracker, but allow me to rephrase myself. Hope this would
be more convincing.

And I would ask one of the authors of Big5-UAO to see if
there's anything he could help. I hope he'll be glad to put
Big5-UAO into libiconv, which I suppose is the most famous
transcoding library.

*

I don't have any evidence about the population of Big5-UAO,
(see bottom for some Big5-UAO reference)
mainly because almost all members of the Big5 family claim
they are exactly Big5, without any suffix or prefix. So we can
(almost) never know if a text labeled as encoded with Big5,
which exactly the Big5 is.

Speaking to population, I think the most used Big5 here in
Taiwan should be CP950 nowadays, counting the legacy.
This is because CP950 is the default encoding in Windows
for Traditional Chinese, and Windows is definitely the majority.
The problem is, in the old days without Unicode, we were not
satisfied with CP950, we need more characters, even Japanese
characters.

So there were tons of Big5 extensions (and actually, they
are all CP950 based) got invented. One of the most famous
one is 櫻花輸入法. (sakura input method?) It's an input method
that could produce Big5 hiragana and hatakana, etc. Big5-UAO
is a project that tried to merge all those mess.
(yet itself also contributed to this mess, was another story...)

Actually, Big5-UAO is not an official name, but my own call.
(in Ruby's issue tracker) That's why you can't find something
by the term "big5-uao". The official name is "Unicode 補完計畫",
(or Unicode-At-On in English) -- a software that complements
Unicode (but actually it's yet another Big5 member, which is
Unicode aware. Many people in Taiwan thought UAO is
Unicode that time, but it's not.

For now, I'll use the term UAO for the software, and
Big5-UAO for its encoding.

UAO was successful, in that old days, if you ask me.
It was once a must-installed software in the old Windows days
in Taiwan. It had solved the messy Big5 (CP950) extension
issues. I remember that once I'd installed it, all Japanese
characters in the wild started displaying. It worked like a charm,
because it'd replace the code page table in Windows with its
own table, and, because of this, people needed to install UAO
on his or her computer, too, to display characters that produced
from computers that installed UAO. This is why it had spread
very fast and widely.

Today, I think UAO is no longer maintained, and no one would
suggest people to install that software, due to the intrusive
operation on code page table in the system. But Big5-UAO is
still very important nowadays, the encoding is still used in some
places where we can't or afford to use Unicode.

The largest part is http://www.ptt.cc/index.html
It's a mirror from BBS ( telnet://ptt.cc (or ssh://address@hidden ))
The reason it is still using Big5-UAO is the traffic is too large.
The average of online connections is at least 120k, and
maximum is about 150k. The site is hard coded for Big5
(with or without UAO) due to performance issue. And I think
almost all BBS clients people are using, did bundle its own
Big5-UAO table. So people without installing UAO, is still
using Big5-UAO... with specialized BBS client. (e.g. PCMan,
Pietty, Nally, Zterm.java, etc.) Everybody is bundling its own
Big5-UAO, which is bad.

On the other hand, big5-to-unicode table in Big5-UAO is a
superset of the one in CP950, so that it won't hurt to use
Big5-UAO to decode CP950 encoded text. In fact, that's
also the table Mozilla Taiwan has chosen to decode Big5.
Their Big5 is actually a combination of Big5-UAO doing
big5-to-unicode, and for unicode-to-big5, it's a specialized
one based on CP950, and some extensions from Big5-UAO
and Big5-2003. I might want to call it Big5-Moz...[1] but anyway.

*

I think the only important Big5 nowadays is CP950, Big5-UAO,
and Big5-HKSCS (but actually I don't really know about Big5-HKSCS,
I just use them to decode Big5-UAO encoded text if there's no
Big5-UAO can be used. Not accurate, but at least it could decode
some of Japanese characters). The most widely used Big5 should
be CP950, and Big5-UAO is somehow a superset of it.

No one really uses Big5-2003. And according to Mozilla Taiwan[0],
*don't use* Big5-1984, Big5-ETEN, Big5 from Unicode website
(only better than worst), Big5+ (worst), and Big5-E.

Speaking to backward compatibility, I agree that we shouldn't
change too much in Big5. But for people in Taiwan using Windows,
current Big5 in libiconv isn't really useful. CP950 is a lot better,
and Big5-UAO would be the best.

> Second, once you have evidence that it is wide-spread, I need an
> authoritative mapping table from/to Unicode.
>
> You say "Big5-UAO's not an official standard". This is exactly the problem.
> If something is not a standard, not only it may be extended or changed
> without notice, but - even worse - different organizations or groups may
> make different, incompatible changes to it. This is how to big mess around
> BIG5 has come into existence, in the past. You will certainly understand
> that I don't want to contribute to this mess. Especially since there is now
> a semi-official encoding called BIG5-2003.
>
> Bruno

I think Big5-UAO is no longer maintained[2], as stated above.
And certainly libiconv could be some kind of de facto standard.
I am wondering why no one (was there?) proposed Big5-UAO
to libiconv for these years... I hope this is not too late.

Many thanks for your listening!

cheers,
Jen-Shin

[0] http://moztw.org/docs/big5/
[1] http://moztw.org/docs/big5/table/moz18-b2u.txt
http://moztw.org/docs/big5/table/moz18-u2b.txt
http://moztw.org/docs/big5/table/moz18-b2u-strict.txt
[2] http://uao.cpatch.org/ (I can't even connect to UAO official website!)

* http://en.wikipedia.org/wiki/Big5#Unicode-at-on
* http://zh.wikipedia.org/zh-tw/Unicode補完計畫



reply via email to

[Prev in Thread] Current Thread [Next in Thread]