[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: lynx-dev Line breaks and Double-byte Charsets
From: |
Klaus Weide |
Subject: |
Re: lynx-dev Line breaks and Double-byte Charsets |
Date: |
Wed, 28 Jul 1999 11:09:34 -0500 (CDT) |
On Tue, 27 Jul 1999, Erik Peterson wrote:
> Hello,
>
> I write CGI programs involving Chinese. I use lynx for some of
> these programs to download a Chinese web page in a nice format
> and dump it to a text file for further processing. I call lynx like
> this:
>
> lynx -assume_charset=gb2312 -dump some_url
>
> I've recently noticed that when lynx formats the text and
> insert line breaks, it will sometimes insert line breaks in the
> middle of a double-byte character. This messes up the
> following text until the next ASCII range letter.
>
> I've tried this on both DOS and Unix and with the latest
> release version of lynx (2.8.2) and get the same results.
One thing to try, I don't know whether it makes a difference:
Go into interactive lynx, goto Options menu, set display character
set to the one corresponding to -assume_charset, save to disk.
Try the -dump invocation now.
Also try whether using -raw makes a difference.
Klaus