[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Feature request: option to not download rejected files
From: |
Tim Rühsen |
Subject: |
Re: [Bug-wget] Feature request: option to not download rejected files |
Date: |
Fri, 29 Jun 2018 15:31:26 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 |
On 06/29/2018 03:20 PM, Zoe Blade wrote:
> For anyone else who needs to do this, I adapted Sergey Svishchev's 1.8-era
> patch for 19.1 (one of the few versions I managed to get to compile in OS X;
> I'm on a Mac, and not the best programmer):
>
> recur.c:578
> - if (blacklist_contains (blacklist, url))
> + if (blacklist_contains (blacklist, url) || !acceptable (url))
>
> It's not ideal, but it seems to solve the problem as a temporary fix.
> Hopefully it might help someone else who needs this functionality.
Hi Zoë,
we recently had a discussion (20.6.2018 "Why does -A not work") where I
confirmed that --reject-regex works like a filter for detected URLs.
BTW, the OP wanted --reject-regex to download+parse HTML (and delete
thereafter if matching the rejected regex) - so the opposite from your
request.
In Wget2 there is an extra option for this, --filter-urls. Maybe
--filter-mime-type is also worth a look.
Best would be if you can provide a small example / reproducer. It can
also be a hand-crafted HTML file.
Regards, Tim
signature.asc
Description: OpenPGP digital signature