[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Relative links for wget2
From: |
Tim Rühsen |
Subject: |
Re: Relative links for wget2 |
Date: |
Sat, 4 Sep 2021 20:17:12 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 |
On 23.08.21 06:10, Matt Huszagh wrote:
Tim Rühsen <tim.ruehsen@gmx.de> writes:
this works as expected with wget2 built from latest git master. Which
reminds me that we urgently need a new release.
If you want to build wget2 from tarball (which is more hassle-free than
building from git master), follow the instruction from
https://gitlab.com/gnuwget/wget2/#downloading-and-building-from-tarball). Don't
forget to install the requisites beforehand.
Feel free to ask here if you run into trouble.
Ok, so I just tried the latest master
(7c7bbf2c2752f1038f10fb298330fe7c93811030). And mostly that's fixed the
issues I was seeing! The first URL I posted appears as I'd
expect. However, I'm still having some trouble with the Wikipedia
download:
wget2 --robots=off --page-requisites --adjust-extension --convert-links=on
https://en.wikipedia.org/wiki/EPROM
I still can't get it to download all of the image page
prerequisites. These point to URLs rather than local files. For example,
<div class="thumb tleft"><div class="thumbinner" style="width:252px;"><a href="https://en.wikipedia.org/wiki/File:EPROM_Intel_C1702A.jpg"; class="image"><img alt=""
src="https://upload.wikimedia.org/wikipedia/commons/thumb/3/39/EPROM_Intel_C1702A.jpg/250px-EPROM_Intel_C1702A.jpg"; decoding="async" width="250" height="130" class="thumbimage"
srcset="https://upload.wikimedia.org/wikipedia/commons/thumb/3/39/EPROM_Intel_C1702A.jpg/375px-EPROM_Intel_C1702A.jpg 1.5x, https://upload.wikimedia.org/wikipedia/commons/thumb/3/39/EPROM_Intel_C1702A.jpg/500px-EPROM_Intel_C1702A.jpg 2x" data-file-width="1275" data-file-height="665" /></a> <div
class="thumbcaption"><div class="magnify"><a href="https://en.wikipedia.org/wiki/File:Eprom.jpg"; class="internal" title="Enlarge"></a></div>An Intel 1702A EPROM, one of the earliest EPROM types (1971), 256 by 8 bit. The small quartz window admits UV light for
erasure.</div></div></div>
Is that expected?
Sorry for the late answer.
If you add '-d -o log' to the command line, you'll see why an URL was
not downloaded. It's sometimes hard to follow, though.
I think in your case it is because the domain "upload.wikimedia.org" is
not followed (the so-called 'host-spanning' is off).
You can switch it on with -H (or --span-hosts).
Slightly more control gives the option -D where you can specify a
comma-separated list of domains that you want to download from.
So with -H the HTML looks like
<div class="thumb tleft"><div class="thumbinner" style="width:252px;"><a
href="https://en.wikipedia.org/wiki/File:EPROM_Intel_C1702A.jpg";
class="image"><img alt=""
src="../../upload.wikimedia.org/wikipedia/commons/thumb/3/39/EPROM_Intel_C1702A.jpg/250px-EPROM_Intel_C1702A.jpg"
decoding="async" width="250" height="130" class="thumbimage"
srcset="../../upload.wikimedia.org/wikipedia/commons/thumb/3/39/EPROM_Intel_C1702A.jpg/375px-EPROM_Intel_C1702A.jpg
1.5x,
../../upload.wikimedia.org/wikipedia/commons/thumb/3/39/EPROM_Intel_C1702A.jpg/500px-EPROM_Intel_C1702A.jpg
2x" data-file-width="1275" data-file-height="665" /></a> <div
class="thumbcaption"><div class="magnify"><a
href="https://en.wikipedia.org/wiki/File:Eprom.jpg"; class="internal"
title="Enlarge"></a></div>An Intel 1702A EPROM, one of the earliest
EPROM types (1971), 256 by 8 bit. The small quartz window admits UV
light for erasure.</div></div></div>
And here are the jpg files
$ tree upload.wikimedia.org/|grep EPROM
│ │ └── EPROM_Intel_C1702A.jpg
│ │ ├── 250px-EPROM_Intel_C1702A.jpg
│ │ ├── 375px-EPROM_Intel_C1702A.jpg
│ │ └── 500px-EPROM_Intel_C1702A.jpg
│ │ └── Nec_02716_EPROM.jpg
│ │ ├── 120px-Nec_02716_EPROM.jpg
│ │ ├── 180px-Nec_02716_EPROM.jpg
│ │ └── 240px-Nec_02716_EPROM.jpg
And href="https://en.wikipedia.org/wiki/File:EPROM_Intel_C1702A.jpg";
is not a page requisite !?
(My brain is already mush, have to go afk =/)
Regards, Tim
OpenPGP_signature
Description: OpenPGP digital signature
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: Relative links for wget2,
Tim Rühsen <=