|
From: | Marshall Burns |
Subject: | [Bug-wget] Wget results VERY different from browser save |
Date: | Wed, 2 Jan 2019 12:30:06 -0600 |
On closer inspection, I've found that the results from Wget and Firefox are very different. Neither is perfect, but the Wget results are definitely wrong. Here are the results from both: ================================================= wget --append-output=Wget_Google.log --show-progress --no-directories --adjust-extension --directory-prefix=download/Google2 --convert-links --backup-converted --page-requisites --span-hosts http://www.Google.com Contents of folder "download\Google2": 2016 12 07 19:00 5,482 googlelogo_white_background_color_272x92dp.png 2019 01 02 11:10 11,587 index.html 2019 01 02 11:10 11,437 index.html.orig 2016 12 16 06:30 12,263 nav_logo229.png 2018 11 16 04:00 6,913 robots.txt 5 File(s) 47,682 bytes Wget log: See attached "Wget_Google.log". Result of save as viewed in Firefox: See attached "Google from Wget.png". ================================================= Firefox at https://www.google.com/ File > Save Page As > Save as type: Web Page, complete Contents of folder: 2019 01 02 11:15 222,403 Google2.htm 2019 01 02 11:15 <DIR> Google2_files Contents of subfolder "Google2_files" 2019 01 02 11:15 140,084 cbgapi.loaded_0 2019 01 02 11:15 13,504 googlelogo_color_272x92dp.png 2019 01 02 11:15 85,565 msb_wizaaabdasyncdvlfootiflipv6lummusfxz7cCd 2019 01 02 11:15 140,913 rsAA2YrTv-X7m9A6GmnfpSsKdPIfvIYg06ZQ 2019 01 02 11:15 403,380 rsACT90oGMg6Rr6Oa277nSkJoiMyEfVXOeOQ 5 File(s) 783,446 bytes Result of save as viewed in Firefox: See attached "Google from Firefox.png". ================================================= Actual appearance of the webpage: See attached "Google original.png". ================================================= Observations: * The main file saved by Firefox is 218 kb, that by Wget is only 12 kb. * Firefox saves five additional files, Wget only three, and none of them even have the same filenames! * Firefox gets the page layout right, including headers and footers, but for some reason doesn't show the logo. Wget looks like it downloaded a different page. The whole layout is different. But it got the logo right. What do I need to do for Wget to get the page correctly? Thank you. ================================================= From: address@hidden [mailto:address@hidden Sent: Wednesday, January 2, 2019 04:50 To: 'address@hidden' Subject: How to simulate "Save as webpage, complete"? Hi, not a bug, but a question: The command: wget --no-directories --adjust-extension --directory-prefix _files --convert-links --page-requisites --span-hosts http://www.Google.com saves the Google homepage as "index.html" along with associated files, all together in the folder "_files". The result works nicely, but what I want is for "index.html" to be in one folder and the associated files to be in a subfolder of that called "_files". This is what a browser does when one asks it to "save as webpage, complete." How do I simulate that behavior with Wget? The manual entry for -P / --directory-prefix says "the directory prefix is the directory where all other files and subdirectories will be saved." Because of the word "other," I thought this would do what I want, but it didn't. It put all the files in the same directory, including "index.html". I am using Wget, v. 1.20 as the Windows binary provided by Jernej Simončič at www.eternallybored.org/misc/wget/ and running it in a DOS window ("Command Prompt") of Windows 7. Thanks for your help.
Wget_Google.log
Description: Binary data
Google from Wget.png
Description: PNG image
Google from Firefox.png
Description: PNG image
Google original.png
Description: PNG image
[Prev in Thread] | Current Thread | [Next in Thread] |