[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] --convert-links and filenames with colons
From: |
Joachim Breitner |
Subject: |
[Bug-wget] --convert-links and filenames with colons |
Date: |
Mon, 26 Oct 2015 13:42:41 +0100 |
Dear wget developers,
it seems that "wget -r -k" is a bit careless with creating relative
URLs that start with “something:”, which would then be mis-interpreted
as the protocol specification of an URL.
For example, downloading these two files:
/tmp/wget/input $ head *
==> file:with:colon.html <==
<html>
<body>
<a href="./file:with:colon.html">Foo</a>
<a href="./file_without_colon.html">Bar</a>
</body>
</html>
==> file_without_colon.html <==
<html>
<body>
<a href="./file:with:colon.html">Foo</a>
<a href="./file_without_colon.html">Bar</a>
</body>
</html>
with "wget -k -r" produces this output:
==> localhost:8000/file:with:colon.html <==
<html>
<body>
<a href="file:with:colon.html">Foo</a>
<a href="file_without_colon.html">Bar</a>
</body>
</html>
==> localhost:8000/file_without_colon.html <==
<html>
<body>
<a href="file:with:colon.html">Foo</a>
<a href="file_without_colon.html">Bar</a>
</body>
</html>
and the browser will not be able to follow the link to Foo.
This is a practical problem when trying to mirror a mediawiki
installation.
I suggest to avoid the issue by prepending relative links with "./",
either always (why not?), or when there relative file name started with
something that looks like “foo:”.
Thanks,
Joachim
--
Joachim “nomeata” Breitner
address@hidden • http://www.joachim-breitner.de/
Jabber: address@hidden • GPG-Key: 0xF0FBF51F
Debian Developer: address@hidden
signature.asc
Description: This is a digitally signed message part
- [Bug-wget] --convert-links and filenames with colons,
Joachim Breitner <=