[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Downloading a web page's html code: Wget vs. Chrome's "Save Page WE" ext
From: |
michel . kempeneers |
Subject: |
Downloading a web page's html code: Wget vs. Chrome's "Save Page WE" extension |
Date: |
Thu, 2 Jan 2020 15:29:16 +0100 (CET) |
Hi,
when I use Wget to download the html code of this eBay page --- it's just an
example, no strings attached:
[
https://www.ebay.fr/itm/Cham-La-Civilisation-a-la-Porte-CARICATURE-turquie/143485908416
|
https://www.ebay.com/itm/The-Holy-Bible-King-James-Version-Old-New-Testaments-Black-GET-FREE-BIBLES/261537277493
]
I get a *.txt file containing sufficient info to allow me to fetch the page's
images.
However, when I use the Chrome Extension "Save Page WE" to do the same, I
obtain a file almost 10 times as big (!!), in which I can also find the
object's description --- which is actually what I'm after.
(it's on line 1275 according to Notepad++)
This information is missing in Wget's version, and I wonder why.
Is there a reason why Wget only seems to find a minimal version of the code of
this page?
(or maybe the correct question is: why the html file which is saved by that
extension, is so much taller?)
Or should I modify my command for better results?
Here's (a schematic form of) the command I used:
wget -nv -o log.txt -O URL.txt URL
I'm tempted to think that this must be about internal Wget processes, as I
don't see how I could influence this behavior by using other switches, but in
fact I have no clue.
If at all possible, I'd rather get the *complete* html code using Wget; at
current I can only wonder why this is NOT the case...
Please note that I'm not very knowledgeable about web pages. I can
recognize/decipher some of the html tagging, but that's where it stops.
Or to put that in other words: I definitely lack the knowledge or experience to
see if the Wget version is correct, or "incomplete"; I only observe that some
of the information seems missing...
Thx,
M.
- Downloading a web page's html code: Wget vs. Chrome's "Save Page WE" extension,
michel . kempeneers <=