[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #59086] --page-requisites not always working when creating a warc f
From: |
Thomas Egense |
Subject: |
[bug #59086] --page-requisites not always working when creating a warc file |
Date: |
Wed, 9 Sep 2020 04:52:04 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0 |
URL:
<https://savannah.gnu.org/bugs/?59086>
Summary: --page-requisites not always working when creating a
warc file
Project: GNU Wget
Submitted by: thomasegense
Submitted on: Wed 09 Sep 2020 08:52:02 AM UTC
Category: Program Logic
Severity: 3 - Normal
Priority: 5 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Name:
Originator Email:
Open/Closed: Open
Release: None
Discussion Lock: Any
Operating System: GNU/Linux
Reproducibility: None
Fixed Release: None
Planned Release: None
Regression: None
Work Required: None
Patch Included: None
_______________________________________________________
Details:
Url example: https://jyllands-posten.dk/
How to reproduce:
echo "https://jyllands-posten.dk/" >> url_list.txt
wget --level=1 --recursive --warc-cdx --page-requisites --warc-file=jp
--warc-max-size=1G -i url_list.txt
The source code for the page is downloaded in the warc (last record). But none
of the images are downloaded and links are also followed (--recursive
parameter).
It is probably due to some HTTPS redirection, but since the
source code is downloaded correct, it should still be possible to follow links
and download page requisites.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?59086>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [bug #59086] --page-requisites not always working when creating a warc file,
Thomas Egense <=