[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] [bug #50935] TEXTHTML not properly set if page is already dow
From: |
Tim Ruehsen |
Subject: |
[Bug-wget] [bug #50935] TEXTHTML not properly set if page is already downloaded |
Date: |
Fri, 12 May 2017 04:02:25 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0 |
Update of bug #50935 (project wget):
Status: Need Info => Confirmed
_______________________________________________________
Follow-up Comment #3:
Sorry, my stupidity :-)
I was stuck with the first command and everything was fine, so I didn't really
check the next command :-(
You are right, if the file exists the -p -nc combination says 'File ...
already there; not retrieving.' and does nothing.
Instead it should read and parse that file (after checking that it really is a
HTML or CSS). Wget currently has no heuristic, so it should make a HEAD
request to check the content-type. What Wget really does is looking at the
file name extension.
So you can do the trick with
wget -xHE -nc 'https://news.ycombinator.com/item?id=14245538'
wget -pH -nc 'https://news.ycombinator.com/item?id=14245538'
I will add this issue as a reference in Wget2 development, where we will do it
correctly (using HEAD request).
Thanks for your report !
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?50935>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/