[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Wget-dev] wget2 | [Unverified] Wget2 _may_ be ignoring the robots file

From: Darshit Shah
Subject: [Wget-dev] wget2 | [Unverified] Wget2 _may_ be ignoring the robots file on restart (#398)
Date: Fri, 24 Aug 2018 10:42:06 +0000

New Issue was created.

Issue 398: https://gitlab.com/gnuwget/wget2/issues/398
Author:    Darshit Shah

So, I just noticed this, but haven't had a chance to verify the exact issue. It 
seems like if the server has a robots.txt that prohibits Wget from running, it 
exits out the first time. But if you restart Wget, it will just start crawling 
the site irrespective of the robots.txt

My guess is that, when it identifies that the robots.txt file has already been 
downloaded, it short circuits the path preventing the robots check for ever 

DO we even need to store the robots file? I've never seen it in Wget1.x

Reply to this email directly or view it on GitLab: 
You're receiving this email because of your account on gitlab.com.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]