[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] html pasing via wget
From: |
Micah Cowan |
Subject: |
Re: [Bug-wget] html pasing via wget |
Date: |
Mon, 02 Mar 2009 12:23:35 -0800 |
User-agent: |
Thunderbird 2.0.0.19 (X11/20090105) |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Роман Мартынович wrote:
> Hello!
>
> I use wget on Windows to parse html files form the Web to my pc. I live
> in Russia and so I parse Russian sites. Sometimes parsed files happen to
> be stored in wrong encoding - they have charset=windows-1251 in their
> <meta> tag, but I have to choose the koi-8 encoding to get them appear
> correctly in Firefox, and in MS Notepad it's impossible to change
> encoding. I can't find the reason why. And I also cannot process these
> files in my applications.
>
> So I ask you to make it possible to choose encoding of html files as an
> option, or if it is a bug to fix it.
Wget doesn't do transcoding of files; it just stores it directly as the
server gave it. We might add a feature to do so at some point in the
future, perhaps, but not likely any time soon. At some point, we would
like to add arbitrary post-download filters, which could probably also
be used to address this sort of thing.
The real problem, though, is that whoever created the files set the meta
tag incorrectly; you should contact the site to address this problem.
- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
Maintainer of GNU Wget and GNU Teseq
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkmsQD4ACgkQ7M8hyUobTrG/awCbB/nh+SugovMYKUcDf5r0gTUa
a6YAn0vkyrXpGBmYRjPZ6DgugCWZQkRF
=3dvI
-----END PGP SIGNATURE-----