[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] robots.txt seemingly ignored
From: |
Darshit Shah |
Subject: |
Re: [Bug-wget] robots.txt seemingly ignored |
Date: |
Tue, 15 May 2018 11:34:33 +0200 |
User-agent: |
NeoMutt/20180323 |
Hi,
You are using a very old version of Wget. v1.12 was released in 2009 if I
remember correctly.
The current version of Wget doesn't seem to have any issues with the parsing of
that robots.txt. I just tried it locally and it downloads no files at all.
Please update your version of Wget.
* Daniel Feenberg <address@hidden> [180514 16:51]:
>
> I have the following wget command line:
>
> wget -r http://wwwdev.nber.org/
>
> http://wwwdev.nber.org/robots.txt is:
>
> User-agent: *
> Disallow: /
>
> User-Agent: W3C-checklink
> Disallow:
>
>
> However wget fetches thousands of pages from wwwdev.nber.org. I would have
> thought nothing would be found. (This is a demonstration, obviously in real
> life I'd have a more detailed robots.txt to control the process).
>
> Obviously too, I don't understand something about wget or robots.txt. Can
> anyone help me out?
>
> This is GNU Wget 1.12 built on linux-gnu.
>
> Thank you
> Daniel Feenberg
>
--
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
signature.asc
Description: PGP signature