[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Problematic default file naming system (BUG?)
From: |
address@hidden |
Subject: |
Re: Problematic default file naming system (BUG?) |
Date: |
Sun, 15 Oct 2023 21:39:22 +0200 (SAST) |
Functioning as designed ...
(Disclaimer: I am not an expert user of this program, but I have some
experience that may help you:)
I guess you are Windows users. Unlike Unix and Linux systems, in Windows the
last part of a file name (anything following the last ("rightmost") period is
considered the file extension and can be used to determine what application
would open the file by default (e.g a .html file would be opened by a browser,
a .doc (or nowadays a .docx file) would be given to a word processor, such as
Microsoft Office's word.exe (the .exe indicating that this file contains
executable code)etc.)
That is the one aspect of what is going on here - you downloaded something that
was a .html file, but you didn't give it a name. Somewhere in teh documentation
it will tell you that (and presumably why) it will give such a file a default
file name of "index" followed by the file extension.
The other aspect is what will happen if you download a file to a location where
a file of the same name and extension is already present. There are a few
options, between which you can choose using parameters on the command line -
and these options make good sense in certain circumstances and none at all in
certain other circumstances. (I'll let you dig through the documentation of
wget, since that is an important part of testing (evaluating) the program as
part of your project ;-)
The most obvious choices you may want to try out are the following (and they
apply regardless of whether you are downloading a file named index.html or an
image file named JamesBond007.jpg - I'll go with index.html for an example):
First option:
Your existing file index.html is now outdated and the new version - with the
same file name - will overwrite it. (hint: in the language of the
documentation, it will "clobber" the file.)
Second option:
Your existing file should not be overwritten ("clobbered"), so even though your
new file was meant to have the same name, it will be called index.html.1 or
index.html.2 or - eventually index.html.4711 and so on. This may not be pretty,
but it is effective. Windows users typically would expect to see a different
syntax (but wget is not just for Windows) - index (1).html, index (2).html,
..., index (4711).html might look more acceptable to you ...
Third option:
When downloading files across a notoriously unreliable line the process may be
interrupted by line failure before the file is complete. Wget gives you the
option then to continue downloading by adding the additional data from retrying
the download to the end of the existing file - in my life that has been the
option I used most, especially since Murphy's Law stipulates that the worse
your line, the bigger your files.
Obviously, wget can't make the decision for you, which of these options you
need in any given situation. And it is pretty much impossible to fix the
results after the fact if you chose the wrong one. What you can do, though, is
rename all the .1, .2, .3, etc. files to something more sensible. And when you
plan to download complete web sites or similar groups of files, wget offers you
ways to drop them with sensible names (most likely taken from your source) into
a suitable directory structure (e.g. to duplicate the source structure.)
Study the documentation that came with your downlaoded copy of wget (or find it
elsewhere on the web) and play with the program a bit more. Do come back here
for more advice if/when needed. And I'll let the experts answer when their
input is needed ;-)
Good luck,
Gerd
----- Original Message -----
From: "Joel F Leppänen" <Joel.F.Leppanen@student.lut.fi>
To: "bug-wget" <bug-wget@gnu.org>
Sent: Sunday, October 15, 2023 4:44:33 PM
Subject: Problematic default file naming system (BUG?)
Hi all,
We’re testing wget version 1.24.4 for a school project. When downloading an
.html file, if you don’t name it and download additional .html files, also
unnamed, it saves the second and the following files after that in formats that
don’t exist. The first one is saved as ”index.html” and the second one as
”index.html.1”, the third one as ”index.html.2” and so forth. The files can of
course be changed back to .html-formats afterwards, but I feel like this is a
bug that affects user experience negatively (or it’s intended, but I can’t
figure out why that would be).
Regards,
Joel Leppänen and Werneri Punavaara
LUT University