[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] -m --iri unnecessarily modifies double-escapes incorrectly, w
From: |
Barry Allard |
Subject: |
[Bug-wget] -m --iri unnecessarily modifies double-escapes incorrectly, whereas -m --no-iri works |
Date: |
Sun, 27 Sep 2015 14:29:24 -0700 |
# skips all double-encoded [ui]ris because it reinterprets them, outside
uri.c:reencode_escapes(), probably in iri.c.
wget --iri -mr http://www.liteirc.net/mirrors/siyobik.info/reference.html
# works
wget --no-iri -mr http://www.liteirc.net/mirrors/siyobik.info/reference.html
Correct [ui]ri:
http://www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%252FXLATB.html
(200)
Incorrect [ui]ri: Correct [ui]ri:
http://www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%2FXLATB.html (404)
# pcnt_decode(pcnt_decode(“%252F”) -> “%2F") -> “/"
Simple-but-incomplete hackaround: use --no-ri
To improve compatibility with mirroring international sites, the iri code path
could approximate behavior of url.c/url_parse() by avoiding unnecessary
modification to --mirror extracted [ui]ris, possibly around the time it
adds/dequeues them to/from the queue.
Best,
Barry Allard
signature.asc
Description: Message signed with OpenPGP using GPGMail
- [Bug-wget] -m --iri unnecessarily modifies double-escapes incorrectly, whereas -m --no-iri works,
Barry Allard <=