[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] Filtering of page requisites
From: |
Dale R. Worley |
Subject: |
[Bug-wget] Filtering of page requisites |
Date: |
Wed, 12 Oct 2016 10:49:44 -0400 |
So I've run into another version of the problem: I'm using
--page-requisites, and they're getting filtered in much the same way as
redirections. However, the new fixes don't change that behavior.
The example case is that
$ wget --mirror --convert-links --page-requisites --limit-rate=20k \
--include-directories=/assignments \
http://www.iana.org/assignments/index.html
does not fetch the CSS specified by
http://www.iana.org/assignments/index.html in
<link rel="stylesheet" media="screen" href="../_css/2015.1/screen.css"/>
which is http://www.iana.org/_css/2015.1/screen.css.
It looks like requisite URLs are flagged with link_inline_p of struct
urlpos true. If that flag is set and opt.page_requisites is set, then
test 4 of download_child is suppressed (which is the --no-parent test).
This change seems to add the same logic as is applied to redirections:
diff --git a/src/recur.c b/src/recur.c
index 1469e31..b1f9109 100644
--- a/src/recur.c
+++ b/src/recur.c
@@ -462,6 +462,12 @@ retrieve_tree (struct url *start_url_parsed, struct iri
*pi)
r = download_child (child, url_parsed, depth,
start_url_parsed, blacklist, i);
+ if (child->link_inline_p &&
+ (reason == WG_RR_LIST || reason == WG_RR_REGEX))
+ {
+ DEBUGP (("Ignoring decision for page requisite, decided
to load it.\n"));
+ reason = WG_RR_SUCCESS;
+ }
if (r == WG_RR_SUCCESS)
{
ci = iri_new ();
and it has the expected effect, the requisites for index.html are
downloaded.
I've attached a patch for this that includes an update to the manual page.
Although the update to the manual page doesn't mention the suppression
of the --no-parent test.
Dale
requisite.diff
Description: Text Data
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Bug-wget] Filtering of page requisites,
Dale R. Worley <=