bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Downloading a web page's html code: Wget vs. Chrome's "Save Page WE" ext


From: michel . kempeneers
Subject: Downloading a web page's html code: Wget vs. Chrome's "Save Page WE" extension
Date: Thu, 2 Jan 2020 15:29:16 +0100 (CET)

Hi, 

when I use Wget to download the html code of this eBay page --- it's just an 
example, no strings attached: 
[ 
https://www.ebay.fr/itm/Cham-La-Civilisation-a-la-Porte-CARICATURE-turquie/143485908416
 | 
https://www.ebay.com/itm/The-Holy-Bible-King-James-Version-Old-New-Testaments-Black-GET-FREE-BIBLES/261537277493
 ] 
I get a *.txt file containing sufficient info to allow me to fetch the page's 
images. 

However, when I use the Chrome Extension "Save Page WE" to do the same, I 
obtain a file almost 10 times as big (!!), in which I can also find the 
object's description --- which is actually what I'm after. 
(it's on line 1275 according to Notepad++) 
This information is missing in Wget's version, and I wonder why. 

Is there a reason why Wget only seems to find a minimal version of the code of 
this page? 
(or maybe the correct question is: why the html file which is saved by that 
extension, is so much taller?) 
Or should I modify my command for better results? 
Here's (a schematic form of) the command I used: 

wget -nv -o log.txt -O URL.txt URL 

I'm tempted to think that this must be about internal Wget processes, as I 
don't see how I could influence this behavior by using other switches, but in 
fact I have no clue. 
If at all possible, I'd rather get the *complete* html code using Wget; at 
current I can only wonder why this is NOT the case... 

Please note that I'm not very knowledgeable about web pages. I can 
recognize/decipher some of the html tagging, but that's where it stops. 
Or to put that in other words: I definitely lack the knowledge or experience to 
see if the Wget version is correct, or "incomplete"; I only observe that some 
of the information seems missing... 

Thx, 

M. 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]