[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Bug in <meta name="robots" content="nofollow" />
From: |
Micah Cowan |
Subject: |
Re: [Bug-wget] Bug in <meta name="robots" content="nofollow" /> |
Date: |
Thu, 04 Mar 2010 14:52:55 -0800 |
User-agent: |
Thunderbird 2.0.0.23 (X11/20090817) |
Augustin, Stefan wrote:
> Hello,
>
> I want to crawle a web site which uses
> <meta name="robots" content="nofollow" />
> in the HTML HEAD,
> which should be XTHML instead of plain HTML.
> But wget seems to ignore this control information.
>
> Unfortunately, I can't change the code in the HTML pages of this web server.
If I understand you correctly, I think you meant that "wget seems to
obey this control information", otherwise, what would be preventing you
from crawling a web site?
Have a look at
http://wget.addictivecode.org/FrequentlyAskedQuestions#robots for the
solution.
--
Micah J. Cowan
http://micah.cowan.name/