[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Help request: Limit recursion, but unconditionally includ
From: |
Tim Ruehsen |
Subject: |
Re: [Bug-wget] Help request: Limit recursion, but unconditionally include all media files |
Date: |
Tue, 22 Oct 2013 09:26:32 +0200 |
User-agent: |
KMail/4.10.5 (Linux/3.10-3-amd64; KDE/4.10.5; x86_64; ; ) |
On Monday 21 October 2013 12:33:10 Alexander Tobias Heinrich wrote:
> For example, I tried:
> wget --tries=3 --retry-connrefused --no-clobber --load-cookies=cookies.txt
> --convert-links --page-requisites --adjust-extension --recursive
> --include-directories /strategy/live-poker,/download
> http://www.pokerstrategy.com/strategy/live-poker
>
> This correctly downloads only the html documents I want and also gets the
> media files from the /download folder, but:
> - does not modify the html so that <img>-Tags point to the downloaded files
> (however, it does modify <a>-Tags that link to local html documents)
> - does not get media files from other domains.
>
> If for example I add --span-hosts, it simply gets too much (all documents
> from different language versions of the website that I don't need).
>
> Note: For the example URL I provided here you won't need to log in and thus
> the load-cookies option can be waived.
Hi Alexander,
please have a look into the 'Recursive Accept/Reject Options' docs.
You could set the domains to be followed by using --domains.
Also --include-directories and/or --exclude-directories might be a help.
I am not sure that you can achieve your goal with a single call to Wget.
Missing files / directories could be downloaded using separate calls to Wget.
--input-file combined with --force-html and/or --base might be a help.
Regards, Tim