groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Groff] FW: Comp-Software-International Digest #763


From: Ted Harding
Subject: [Groff] FW: Comp-Software-International Digest #763
Date: Thu, 30 Aug 2001 12:00:49 +0100 (BST)

Hi folks,
The following just turned up on the Comp-Software-International
list. I'm forwarding it because, though it arose in a "Chinese"
context, it discusses general issues which may be of interest
to the folk who are developing, or are interested in, the grohtml
and utf8 aspects of groff.

Best wsihes,
Ted.

-----FW: <address@hidden>-----

Date: Thu, 30 Aug 2001 05:13:13 EDT
From: Digestifier
<address@hidden>
To: address@hidden
Subject: Comp-Software-International Digest #763

Comp-Software-International Digest #763, Volume #3Thu, 30 Aug 2001
05:13:13 EDT

Contents:
  Re: Unicode vs other encoding schemes (Garth Grimm)

------------------------------------------------------

From: Garth Grimm <address@hidden>
Subject: Re: Unicode vs other encoding schemes
Date: Wed, 29 Aug 2001 23:51:15 -0700

Hello Lee,

Boy, this is a really tough thing to do.  But when you're done, you'll
find you can get paid well for your skills ;-)

First, know what character set is most often used by your customers. 
For mainland China, we use 'gb2312' (Simplified Chinese).  For Hong Kong
and Taiwan, we use 'big5' (Traditional Chinese).  Unicode (notably
UTF-8) is the HTML 4.0 base standard and things are fast moving that
way, but it will be a little longer until wide spread use of browsers
that support UTF-8 sets in.  If your audience is Chinese speakers in the
US, I'm not sure what is most prevalent.

I assume you know how to use the <META HTTP-EQUIV="Content-type"
CONTENT="text/html; charset=big5"> in the <HEAD> element of a web page?

How familiar are you with HTTP protocal?  Know the difference between
HTTP headers and the HTTP body?  I bring this up because it's possible
for the web server to designate in the HTTP header what the character
set the HTTP body (which contains the web page) should be interpreted
in.  If the web server does this, IE will use the server designated
character set, regardless of what the <META> tag in the web page says. 
This means that you can serve a web page that has the META tag above
from a server that stipulates a character set of 'gb2312', and the
browser will try display the page initially as 'gb2312' which will be
ugly.  Not many web servers do this, but be aware of it, because it may
help you find some bugs.

And of course, your viewers can/may set their browsers to always use a
certain character set rather than letting the browser choose.  There's
nothing you can do to override that.

If you want to try to get really fancy...  An HTTP request header for a
web page (from a browser) can contain a field that states what character
sets the browser would like to receive the page in.  In an ideal world,
your server code would parse this information out, then send back the
appropriate page with the correct encoding that the browser wants. 
Unfortunately, many browsers don't do this at all, and some of them (NN
4.x in particular) do it totally wrong.  We ended up having to use this
type of approach for Russian, but elsewhere we've just used a
combination of the META and web server designations.

The answer to your last question is no.  Just because a browser can
handle an encoding, doesn't mean it has the proper OS fonts to display
the characters correctly.  If your users are in a country that speaks
Chinese natively, you can probably safely assume the browsers they've
installed have been installed with the proper fonts to handle the
Chinese characters.  But if you're primarily dealing with Chinese
speaking citizens in other countries, they may have installed browsers
which were set-up with Western Language fonts (Latin-1 in particular). 
In that case, you'll have to hope (or educate) your users have
downloaded the necessary fonts for their browsers.

International localization on the web is an incredibly chaotic world at
the moment.

Garth
Senior Operations Engineer
hp.com Search Program
Hewlett-Packard Co.


Lyle Coder wrote:
> 
> Im trying to set up a Chinese web site...
> I've read a little on unicode and I'm confused in a few places.
> 
> It seems there are multiple ways I can set up my web site... I can
> chose from many different chinese char sets, like gb2312 and a few
> others or even unicode.  My question is which one should I use?
> 
> Why are there so many ways to create a chinese text page?  Same is tru
> of so many languages...
> 
> If I just use unicode, will all browsers support it?  What is the
> standard and preferred way of doing this?
> 
> Also, if I understand correctly, the browser should be capabale of
> displaying the chosen subset of unicode right?
> 
> Thanks
> Lee

------------------------------


** FOR YOUR REFERENCE **

The service address, to which questions about the list itself and requests
to be added to or deleted from it should be directed, is:

    Internet: address@hidden

You can send mail to the entire list by posting to the
comp.software.international newsgroup.

End of Comp-Software-International Digest
*****************************************

--------------End of forwarded message-------------------------

--------------------------------------------------------------------
E-Mail: (Ted Harding) <address@hidden>
Fax-to-email: +44 (0)870 167 1972
Date: 30-Aug-01                                       Time: 12:00:49
------------------------------ XFMail ------------------------------

reply via email to

[Prev in Thread] Current Thread [Next in Thread]