wget-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wget | wget should save directory listings as index.html (#11)


From: @rockdaboot
Subject: Re: wget | wget should save directory listings as index.html (#11)
Date: Wed, 25 May 2022 17:58:03 +0000



Tim Rühsen commented:


> Honestly, I don't think that to have different content for directory and 
> directory/ is a good idea.

ACK :-) But I see this regularly with pages/sites served by MS IIS. So it is 
not uncommon.

> And in this case directory.1 would just not work, because the simplest file 
> server will return index.html for a directory, but not some directory.1 
> (neither users, nor site links will know nothing about directory.1).

With --convert-links, your links in the mirrored site will point to 
`directory.1`. So any user clicking on HTMl links should be fine. This is not 
true for JS scripts, but let's put that aside as we can't do anything about 
this.
Users navigating directly to links should be fine too, because they copy&pasted 
this from the mirrored site (!?). 

But even if we agree on using only a single file for contents of `directory`, 
`directory/` and `directory/index.html` - which one do you prefer ? Keep in 
mind that those appear (will be downloaded) in any order.

Should we define a priority / order ?

Also, what happens to a file `directory` in case we see `directory/whatever` ? 
Should we rename it to `directory/index.html` (except for when 'whatever' is 
'index.html', then we do what exactly ?) ?

If we are able to come up with a precise algorithm that covers all the corner 
cases, someone can put that into code. Additional command line options to tune 
the behavior can come at a later point.

-- 
Reply to this email directly or view it on GitLab: 
https://gitlab.com/gnuwget/wget/-/issues/11#note_960182379
You're receiving this email because of your account on gitlab.com.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]