[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] robots.txt seemingly ignored
From: |
Daniel Feenberg |
Subject: |
[Bug-wget] robots.txt seemingly ignored |
Date: |
Mon, 14 May 2018 09:39:39 -0400 (EDT) |
User-agent: |
Alpine 2.21 (LRH 202 2017-01-01) |
I have the following wget command line:
wget -r http://wwwdev.nber.org/
http://wwwdev.nber.org/robots.txt is:
User-agent: *
Disallow: /
User-Agent: W3C-checklink
Disallow:
However wget fetches thousands of pages from wwwdev.nber.org. I would have
thought nothing would be found. (This is a demonstration, obviously in
real life I'd have a more detailed robots.txt to control the process).
Obviously too, I don't understand something about wget or robots.txt. Can
anyone help me out?
This is GNU Wget 1.12 built on linux-gnu.
Thank you
Daniel Feenberg
- [Bug-wget] robots.txt seemingly ignored,
Daniel Feenberg <=