[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] can't reject robots.txt in recursive mode
From: |
Giuseppe Scrivano |
Subject: |
Re: [Bug-wget] can't reject robots.txt in recursive mode |
Date: |
Wed, 06 Aug 2014 15:38:43 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) |
Ilya Basin <address@hidden> writes:
> Here's my script to download IBM javadocs:
>
> (
> rm -rf wget-test
> mkdir wget-test
> cd wget-test
>
> starturl="http://www-01.ibm.com/support/knowledgecenter/api/content/SSZLC2_7.0.0/com.ibm.commerce.api.doc/allclasses-noframe.html"
> wget -d -r -R robots.txt --page-requisites -nH --cut-dirs=5 --no-parent
> "$starturl" 2>&1 | tee wget.log
> )
>
> regardless of '-R' option, wget downloads robots.txt and refuses to
> follow links starting with "/support/knowledgecenter/api/".
No need to use any workaround, you should be able to achieve the same
behavior with "-e robots=off" as documented.
Regards,
Giuseppe