On 07/02/13 15:06, bes wrote:
Hi,
i found some bug in wget with interpreting and save percent-encoding 3 byte
utf8 url
example:
1. Create url with "—". This is U+2014 (EM DASH). Percent-encoding UTF-8 is
"%E2%80%94"
2. Try wget it: wget "http://example.com/abc—d" or wget "
http://example.com/abc%E2%80%94d" directly
3. Wget save this URL to file "abc\342%80%94d". Expected is
"abc%E2%80%94d". This is a bug.
The problem is that it checks if it's a printable character in latin1.
There is a bug at https://savannah.gnu.org/bugs/index.php?37564
An option would be to use --restrict-file-names=nocontrol to get the em
dash in the filename, instead of the percent-encoded version.