[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Potential bug or something else?
From: |
Giuseppe Scrivano |
Subject: |
Re: [Bug-wget] Potential bug or something else? |
Date: |
Thu, 20 May 2010 19:23:49 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux) |
what web sites are you trying to access and what wget version are you
using?
It smells like chunked transfer encoding data that the server sends
careless of the HTTP version specified by wget. You can try to build
wget from the source repository, or using a recent alpha tarball where
HTTP/1.1 is supported.
Cheers,
Giuseppe
Mike <address@hidden> writes:
> Hi,
>
> I have been downloading some pages off one of my sites, however I
> sometimes get two 4-digit hex codes appear in the HTML source:
>
> Here's the start of one page:
>
> "209b
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
> "http://www.w3.org/TR/html4/strict.dtd">
> <html>
>
> <head>"
>
> The other 4-digit code appears later on in the page.
>
> Has anyone ever seen this before... it definitely doesn't appear on
> the original page. It appears on all html files in particular
> directories, but some directories are clean.
>
> I'm running with this wget call:
> wget -A html,php,htm -b --default-page=__SLASH__.html --random-wait
> http://www.whateverurl.co.uk -w 10 -r -k -l 100 -U "Botlet"
>
> Any help much appreciated. I can ad some post-processing to remove
> the codes but that feels like a hack. Any ideas what it might be?
>
> Thanks,
> Mike.