bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnu-libiconv] TILDE in Shift-jis


From: Bruno Haible
Subject: Re: [bug-gnu-libiconv] TILDE in Shift-jis
Date: Tue, 20 May 2008 01:25:57 +0200
User-agent: KMail/1.5.4

Hi,

Takemoto wrote:
> Sometimes the //TRANSLIT function of inconv does 
> not produce the expected approximation, particularly with 
> Japanese. 
> ...
> and char(126) from utf-8 to shift-jis
> http://bugs.php.net/bug.php?id=45017
> 
> The php people are saying this out of their juristiction being
> a libconv issue.

The PHP people are right, when they redirect you to bug-gnu-libiconv.

Shift_JIS does not contain a tilde: neither the ASCII TILDE (U+007E),
nor the FULLWIDTH TILDE (U+FF5E). You find the mapping table of libiconv
for Shift_JIS in the file libiconv/tests/SHIFT_JIS.TXT; please convince
yourself.

> I particularly need tildes in shift_jis encoded pages/email

Japanese web pages, nowadays, are most often encoded in CP932 from Microsoft
or UTF-8. ISO-2022-JP-2 is also used, but to a lesser extent.

You can learn about the difference between Shift_JIS and CP932 here:

  http://www.haible.de/bruno/charsets/conversion-tables/Japanese.html
  under "Shift_JIS and extensions".

> since, following the appache standard I have a tilde in my url. 

In URLs you can always escape a tilde by %7E. It is a bit ugly, but when
you have character conversion problems, it is safer.

> Please see 
> http://md2.cc.yamaguchi-u.ac.jp/~eigo/temp/tilde.php

It can also be written

  http://md2.cc.yamaguchi-u.ac.jp/%7Eeigo/temp/tilde.php

> ps this email was sent encoded in Shift-JIS and you can see the tilde

Your mailer may surprise you: Your mail was labelled and encoded as ISO-2022-JP:

  Content-Type: text/plain;
    charset="iso-2022-jp"
  Content-Transfer-Encoding: 7bit

Bruno





reply via email to

[Prev in Thread] Current Thread [Next in Thread]