wget-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

wget2 | Improve download continuation (-c / --continue) (#580)


From: @rockdaboot
Subject: wget2 | Improve download continuation (-c / --continue) (#580)
Date: Sun, 23 Jan 2022 11:38:08 +0000


Tim Rühsen created an issue: https://gitlab.com/gnuwget/wget2/-/issues/580



Wget allows to continue downloading with the -c / --continue option; it does so 
using the `Range:` request header.

**Possible failure modes**
1. The server does not support the range header
2. The file on the server changed it's content between the initial and the 
continuation download
3. The file changes locally, e.g. due to user interaction, filesystem issues, 
etc
4. The file contents change during the network transmission (bit flips, MITM, 
etc)

**Proposed solutions**

I would like to mention that there is a technology called 
[Metalink](https://en.wikipedia.org/wiki/Metalink), which deals with all of the 
failure modes and which gives a near-perfect user experience. Wget2 supports 
Metalink, though Metalink is not supported widely by servers.

So let's put Metalink aside and have a look how we can improve the situation 
without it.

For failure mode 1., there is nothing we can do but restart the download from 
the beginning.
For failure mode 4., the solution is HTTP via TLS, (HTTPS, https://).

For the other failure modes (including 4. if HTTPS is not available), the user 
has to compare the file integrity, e.g. via checksum (md5, sha1, ...). The 
problem here is that this needs knowledge of the checksum (it must be provided, 
e.g. by the server / website). And even with this knowledge, the user can only 
generate and compare the checksum *after* the download has been completed. Some 
files may have self-contained checksums that allows for integrity checks by 
supporting applications (e.g. decompression may fail due to internal checksum 
errors) - but this knowledge is beyond the knowledge of wget.

Failure modes 2. and 3. can be treated the same: the local and the remote data 
do not match. The solutions here could be
- the server's ETAG: header must match (where do we store the ETAG for a 
partially downloaded file, if not in extended attributes (which not all file 
systems support)). Not all servers support the ETAG: header.
- the server's `Last-Modified:` header must match. (Same Q: where to store, not 
all servers provide it).
- start the continuation some bytes earlier as needed to compare the 
overlapping bytes (must match). (How many bytes would be good in the general 
use case ?) If the initial data is below a certain number of bytes, we can 
forcefully restart the download from the beginning (What would be a threshold 
?).

-- 
Reply to this email directly or view it on GitLab: 
https://gitlab.com/gnuwget/wget2/-/issues/580
You're receiving this email because of your account on gitlab.com.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]