lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lynx-dev] Japanese (JIS, EUC, Shift-JIS), uxterm


From: patakuti
Subject: Re: [Lynx-dev] Japanese (JIS, EUC, Shift-JIS), uxterm
Date: Mon, 19 Jul 2004 23:57:45 +0900 (JST)

On Mon, 19 Jul 2004, Henry Nelson wrote:

> On Sat, Jul 17, 2004 at 01:28:33AM +0900, address@hidden wrote:
> > On Sun, 11 Jul 2004, Henry Nelson wrote:
> > > > http://www.feyrer.de/JP/ ->   * [4]English <-> Japanese Dictionary...
> > > 
> > > If you're a friend of Hubert's, ask him to remove the extra charset meta
> > > at the top of his page:
> > > 
> > >     <META HTTP-EQUIV="Content-Type" CONTENT="text/html; 
> > > charset=iso-8859-1">
> > >     <html>
> > >     <head>
> > >     <meta http-equiv="Content-Type" content="text/html; charset=euc-jp">
> > 
> > More precisely, ask him to remove the charset in HTTP header.
> > First META line is in the HTTP header.
> > 
> > Henry, please add this line to your lynx.cfg, then you should never
> > see the extra charset meta in the downloaded file.
> > 
> >     PREPEND_[CHARSET]_TO_SOURCE:FALSE
> 
> So it is Lynx that is prefixing the "extra" charset=iso-8859-1 META at
> the top of the page.  Thanks for correcting me on that point.  Also,
> apologies to Thorsten for my having added to the confusion.
> 
> BUT, now I'm more curious than ever.  Am I right to continue to assume
> this is a case of misconfiguration of the server?  

Yes, I think so.  The charset in HTTP header has a top priority to
determine the charset.  The page is written in euc-jp but the http
header indicats the charset is iso-8859-1.

ref: http://www.w3.org/TR/html401/charset.html#h-5.2.2
  | To   sum  up,  conforming  user  agents  must  observe  the  following
  | priorities  when  determining  a  document's  character encoding (from
  | highest priority to lowest):
  |  1. An HTTP "charset" parameter in a "Content-Type" field.
  |  2. A  META  declaration with "http-equiv" set to "Content-Type" and a
  |     value set for "charset".
  |  3. The  charset  attribute  set  on  an  element  that  designates an
  |     external resource.

> To have Lynx render
> the page "http://www.feyrer.de/JP/"; correctly (at least on my system)
> the charset meta must be the one in the header, "charset=euc-jp", not
> the one Lynx prefixes, "charset=iso-8859-1".  After downloading the page
> with Lynx, either deleting the META that Lynx prefixes, or editing it to
> "euc-jp", fixes the rendering of the Japanese.
> 
> Is there a bug in Lynx?  

No, I believe.

> Specifically, what should "Assumed document
> character set" in the "Display and Character Set" section of the O)ptions
> Menu do?  

Nothing in this case.

> If I change it from "iso-8859-1" to "euc-jp" there is no change
> in the rendering of the page; it is still garbled.  Shouldn't that be a
> manual override that would allow Lynx to render the page correctly?

No, in this case.  ASSUME_CHARSET has an effect only when no charset
is specified explicitly.

# Please refer the ASSUME_CHARSET section in lynx.cfg.

> I ask because (at least my Japanized edition of) MSIE has a way to
> correct the display by manually chosing "Japanese (EUC)" under
> "Encoding(D)" in the "Display(V)" pull-down.  It would be nice if Lynx
> could do that, too.

Building lynx with KANJI_CODE_OVERRIDE in userdefs.h, you may be able
to chose the document charset from AUTO/SJIS/EUC with higher priority
than charset in HTTP header.  Though I've not tested for some years.
--
Takeshi Hataguchi
E-mail: address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]