Re: [Bug-wget] robots.txt seemingly ignored

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] robots.txt seemingly ignored

From:	Darshit Shah
Subject:	Re: [Bug-wget] robots.txt seemingly ignored
Date:	Tue, 15 May 2018 11:34:33 +0200
User-agent:	NeoMutt/20180323

Hi,

You are using a very old version of Wget.  v1.12 was released in 2009 if I
remember correctly. 

The current version of Wget doesn't seem to have any issues with the parsing of
that robots.txt. I just tried it locally and it downloads no files at all.

Please update your version of Wget.

* Daniel Feenberg <address@hidden> [180514 16:51]:
>
> I have the following wget command line:
> 
>    wget -r  http://wwwdev.nber.org/
> 
> http://wwwdev.nber.org/robots.txt  is:
> 
>   User-agent: *
>   Disallow: /
> 
>   User-Agent: W3C-checklink
>   Disallow:
> 
> 
> However wget fetches thousands of pages from wwwdev.nber.org. I would have
> thought nothing would be found. (This is a demonstration, obviously in real
> life I'd have a more detailed robots.txt to control the process).
> 
> Obviously too, I don't understand something about wget or robots.txt. Can
> anyone help me out?
> 
> This is GNU Wget 1.12 built on linux-gnu.
> 
> Thank you
> Daniel Feenberg
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6

signature.asc
Description: PGP signature

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-wget] robots.txt seemingly ignored, Daniel Feenberg, 2018/05/14
- Re: [Bug-wget] robots.txt seemingly ignored, Darshit Shah <=
  - Re: [Bug-wget] robots.txt seemingly ignored, Daniel Feenberg, 2018/05/15

Prev by Date: [Bug-wget] robots.txt seemingly ignored
Next by Date: [Bug-wget] Bug-wget mailing list
Previous by thread: [Bug-wget] robots.txt seemingly ignored
Next by thread: Re: [Bug-wget] robots.txt seemingly ignored
Index(es):
- Date
- Thread