[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] Bug in <meta name="robots" content="nofollow" />
From: |
Augustin, Stefan |
Subject: |
[Bug-wget] Bug in <meta name="robots" content="nofollow" /> |
Date: |
Thu, 4 Mar 2010 15:54:19 +0100 |
Hello,
I want to crawle a web site which uses
<meta name="robots" content="nofollow" />
in the HTML HEAD,
which should be XTHML instead of plain HTML.
But wget seems to ignore this control information.
Unfortunately, I can't change the code in the HTML pages of this web server.
Can somebody help me?
- is it a bug (or not implemented feature) in wget?
- if so, is there a fix available?
Best regards
Stefan Augustin
Siemens AG
Corporate Technology
CT IC 1
Otto-Hahn-Ring 6
81739 München, Deutschland
Tel.: +49 (89) 636-47061
Fax: +49 (89) 636-49438
Mobil: +49 (172) 8455616
mailto:address@hidden <mailto:address@hidden>
Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard Cromme;
Vorstand: Peter Löscher, Vorsitzender; Wolfgang Dehen, Heinrich Hiesinger, Joe
Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y. Solmssen;
Sitz der Gesellschaft: Berlin und München, Deutschland; Registergericht: Berlin
Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322
- [Bug-wget] Bug in <meta name="robots" content="nofollow" />,
Augustin, Stefan <=