wget-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

wget | wget should save directory listings as index.html (#11)


From: Yaroslav Nikitenko (@ynikitenko)
Subject: wget | wget should save directory listings as index.html (#11)
Date: Wed, 25 May 2022 09:54:48 +0000


Yaroslav Nikitenko created an issue: https://gitlab.com/gnuwget/wget/-/issues/11



When I recursively download a site (to serve it statically), upper level paths 
are saved to local file system as directories, and the new server returns them 
as directory listings, while on the original site there is a correct 
representation of that as a custom-made html pages.

Example: if I have `mysite.org/1`, it will be saved correctly as a file `1` (I 
don't want to add extensions or change paths); however, when there is a page 
`mysite.org/petitions` and `mysite.org/petitions/1`, then the web page for 
`mysite.org/petitions` will not be saved (because it will be overwritten by a 
local directory `petitions`).

Proposal: if a web page becomes known as a directory, it should be not 
overwritten, but saved as `index.html` within that directory (or at least there 
must be an option to do that).

Downside: probably, there are some sites that want their directory paths be 
simple listings of files. However, good Web design discourages that: URLs 
should be readable by humans, and every part of the URL should have some 
meaning (if there is a file `mysite/petitions/1`, then the path 
`mysite/petitions` should also be available). In that case, there may be better 
a separate option to provide the old behaviour (while saving a directory to 
`index.html` should still be default).

This problem was reported on 
[superuser](https://superuser.com/questions/790039/wget-doesnt-create-index-file-of-directory)
 and other places ([unix 
stackexchange](https://unix.stackexchange.com/questions/629017/wget-skipping-index-html-for-links-not-ending-in-slashes-when-using-mirror))
 and is [known for a long 
time](https://lists.gnu.org/archive/html/bug-wget/2012-10/msg00026.html); I 
faced that myself recently.

-- 
Reply to this email directly or view it on GitLab: 
https://gitlab.com/gnuwget/wget/-/issues/11
You're receiving this email because of your account on gitlab.com.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]