[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
wget2 | Improve download continuation (-c / --continue) (#580)
From: |
@rockdaboot |
Subject: |
wget2 | Improve download continuation (-c / --continue) (#580) |
Date: |
Sun, 23 Jan 2022 11:38:08 +0000 |
Tim Rühsen created an issue: https://gitlab.com/gnuwget/wget2/-/issues/580
Wget allows to continue downloading with the -c / --continue option; it does so
using the `Range:` request header.
**Possible failure modes**
1. The server does not support the range header
2. The file on the server changed it's content between the initial and the
continuation download
3. The file changes locally, e.g. due to user interaction, filesystem issues,
etc
4. The file contents change during the network transmission (bit flips, MITM,
etc)
**Proposed solutions**
I would like to mention that there is a technology called
[Metalink](https://en.wikipedia.org/wiki/Metalink), which deals with all of the
failure modes and which gives a near-perfect user experience. Wget2 supports
Metalink, though Metalink is not supported widely by servers.
So let's put Metalink aside and have a look how we can improve the situation
without it.
For failure mode 1., there is nothing we can do but restart the download from
the beginning.
For failure mode 4., the solution is HTTP via TLS, (HTTPS, https://).
For the other failure modes (including 4. if HTTPS is not available), the user
has to compare the file integrity, e.g. via checksum (md5, sha1, ...). The
problem here is that this needs knowledge of the checksum (it must be provided,
e.g. by the server / website). And even with this knowledge, the user can only
generate and compare the checksum *after* the download has been completed. Some
files may have self-contained checksums that allows for integrity checks by
supporting applications (e.g. decompression may fail due to internal checksum
errors) - but this knowledge is beyond the knowledge of wget.
Failure modes 2. and 3. can be treated the same: the local and the remote data
do not match. The solutions here could be
- the server's ETAG: header must match (where do we store the ETAG for a
partially downloaded file, if not in extended attributes (which not all file
systems support)). Not all servers support the ETAG: header.
- the server's `Last-Modified:` header must match. (Same Q: where to store, not
all servers provide it).
- start the continuation some bytes earlier as needed to compare the
overlapping bytes (must match). (How many bytes would be good in the general
use case ?) If the initial data is below a certain number of bytes, we can
forcefully restart the download from the beginning (What would be a threshold
?).
--
Reply to this email directly or view it on GitLab:
https://gitlab.com/gnuwget/wget2/-/issues/580
You're receiving this email because of your account on gitlab.com.
- wget2 | Improve download continuation (-c / --continue) (#580),
@rockdaboot <=