[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] [bug #48424] wget fails to convert some URLs when the same fi
From: |
anonymous |
Subject: |
[Bug-wget] [bug #48424] wget fails to convert some URLs when the same file path is retrieved via more than one protocol |
Date: |
Wed, 6 Jul 2016 19:27:25 +0000 (UTC) |
User-agent: |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/601.6.17 (KHTML, like Gecko) Version/9.1.1 Safari/601.6.17 |
URL:
<http://savannah.gnu.org/bugs/?48424>
Summary: wget fails to convert some URLs when the same file
path is retrieved via more than one protocol
Project: GNU Wget
Submitted by: None
Submitted on: Wed 06 Jul 2016 07:27:23 PM UTC
Category: Program Logic
Severity: 3 - Normal
Priority: 5 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Name: Paul Merchant
Originator Email: address@hidden
Open/Closed: Open
Discussion Lock: Any
Release: 1.18
Operating System: Mac OS
Reproducibility: Every Time
Fixed Release: None
Planned Release: None
Regression: None
Work Required: None
Patch Included: None
_______________________________________________________
Details:
If different protocols retrieve the same file path in a recursive crawl that
converts URLs, some of the referring URLs will not be rewritten. For example,
if file A, in directory wget-test contains these links:
<a href="http://myhost/wget-test/b.html">b - http</a>
<a href="https://myhost/wget-test/b.html">b - https</a>
Then a.html retrieved by this command:
wget -m --convert-links http://myhost/wget-test/a.html
will contain
<a href="http://myhost/wget-test/b.html">b - http</a>
<a href="b.html">b - https</a>
Since the different protocols actually refer to different servers (or ports on
the same server that may not sharedirectory aliases), there is no guarantee
that the matching url paths actually represent the same file. Ideally wget
should separate paths by protocol, and offer an option to ignore the protocol
when making paths so that if http and https (or ftp, or...) are known to
correspond to the same directory this can be reflected in the URL conversion.
As wget works now, the no clobber flag cannot be used as a work-around as it
is incompatible with the recursive crawl.
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?48424>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
- [Bug-wget] [bug #48424] wget fails to convert some URLs when the same file path is retrieved via more than one protocol,
anonymous <=