[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Wget-dev] wget2 | [Unverified] Wget2 _may_ be ignoring the robots file
From: |
Darshit Shah |
Subject: |
[Wget-dev] wget2 | [Unverified] Wget2 _may_ be ignoring the robots file on restart (#398) |
Date: |
Fri, 24 Aug 2018 10:42:06 +0000 |
New Issue was created.
Issue 398: https://gitlab.com/gnuwget/wget2/issues/398
Author: Darshit Shah
Assignee:
So, I just noticed this, but haven't had a chance to verify the exact issue. It
seems like if the server has a robots.txt that prohibits Wget from running, it
exits out the first time. But if you restart Wget, it will just start crawling
the site irrespective of the robots.txt
My guess is that, when it identifies that the robots.txt file has already been
downloaded, it short circuits the path preventing the robots check for ever
happening.
DO we even need to store the robots file? I've never seen it in Wget1.x
--
Reply to this email directly or view it on GitLab:
https://gitlab.com/gnuwget/wget2/issues/398
You're receiving this email because of your account on gitlab.com.
- [Wget-dev] wget2 | [Unverified] Wget2 _may_ be ignoring the robots file on restart (#398),
Darshit Shah <=