Re: [Bug-wget] Wget follows "button" links

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Wget follows "button" links

From:	Tim Rühsen
Subject:	Re: [Bug-wget] Wget follows "button" links
Date:	Tue, 5 Jun 2018 14:57:02 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0

On 06/05/2018 11:53 AM, CryHard wrote:
> Hey there,
> 
> I've used the following:
> 
> wget --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) 
> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36" 
> --user=myuser --ask-password --no-check-certificate --recursive 
> --page-requisites --adjust-extension --span-hosts 
> --restrict-file-names=windows --domains wiki.com --no-parent wiki.com 
> --no-clobber --convert-links --wait=0 --quota=inf -P /home/W
> 
> To download a wiki. The problem is that this will follow "button" links, e.g 
> the links that allow a user to put a page on a watchlist for further 
> modifications. This has led to me watching hundreds of pages. Not only that, 
> but apparently it also follows the links that lead to reverting changes made 
> by others on a page.
> 
> Is there a way to avoid this behavior?

Hi,

that depends on how these "button links" are realized.

A button may be part of a HTML FORM tag/structure where the URL is the
value of the 'action' attribute. Wget doesn't download such URLs because
of the problem you describe.

A dynamic web page can realize "button links" by using simple links.
Wget doesn't know about hidden semantics and so downloads these URLs -
and maybe they trigger some changes in a database.
If this is your issue, you have to look into the HTML files and exclude
those URLs from being downloaded. Or you create a whitelist. Look at
options -A/-R and --accept-regex and --reject-regex.

> I'm using the following version:
> 
>> wget --version
> GNU Wget 1.12 built on linux-gnu.

Ok, you should update wget if possible. Latest version is 1.19.5.

Regards, Tim

signature.asc
Description: OpenPGP digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-wget] Wget follows "button" links, CryHard, 2018/06/05
- Re: [Bug-wget] Wget follows "button" links, Tim Rühsen <=
  - Re: [Bug-wget] Wget follows "button" links, CryHard, 2018/06/05
    - Re: [Bug-wget] Wget follows "button" links, Tim Rühsen, 2018/06/05
    - Re: [Bug-wget] Wget follows "button" links, CryHard, 2018/06/05
    - Re: [Bug-wget] Wget follows "button" links, Tim Rühsen, 2018/06/05

Prev by Date: [Bug-wget] Wget follows "button" links
Next by Date: Re: [Bug-wget] Wget follows "button" links
Previous by thread: [Bug-wget] Wget follows "button" links
Next by thread: Re: [Bug-wget] Wget follows "button" links
Index(es):
- Date
- Thread