[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Fwd: Trying to download HTML from Google's Cache. Pls hel
From: |
Micah Cowan |
Subject: |
Re: [Bug-wget] Fwd: Trying to download HTML from Google's Cache. Pls help |
Date: |
Tue, 11 Nov 2008 12:27:05 -0800 |
User-agent: |
Thunderbird 2.0.0.17 (X11/20080914) |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ben Smith wrote:
> Subject: Re: [Bug-wget] Re: Bug-wget Digest, Vol 1, Issue 10
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Bug-wget digest..."
It's helpful if you adhere to this guideline; otherwise it's hard to
follow threads. (I've fixed the subject in my reply.)
> It would be theoretically possible by using grep and sed to strip out
> the links to the cached files and piping that to wget. However,
> Google appears to block access to results pages and cached pages via
> wget. I tried to download several using wget and got a 403 Forbidden
> response.
http://wget.addictivecode.org/FrequentlyAskedQuestions#not-downloading
should be helpful for such problems (using -U is the most applicable
suggestion, but you may also run into the others). Please also consider
adding --limit-rate or --wait.
- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFJGeqZ7M8hyUobTrERAnb3AJ9QExH/DgExUu+9TMVLMzyEcXGLQgCeIwYf
//x+tvr1nFsS978kVWX75cg=
=tZzE
-----END PGP SIGNATURE-----