[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wget2 | Add FTP & FTPS support (#3)

From: @rockdaboot
Subject: Re: wget2 | Add FTP & FTPS support (#3)
Date: Sun, 04 Jul 2021 18:07:47 +0000

Tim Rühsen commented on a discussion: 

If I understand you correctly, you think that FTP in wget2 will be N times 
faster than FTP in wget.

Given that the bottleneck is likely the network throughput, wget2 won't be 
faster than wget. While wget2 may 
download N files in parallel, each of them will be transferred with a speed of 
bandwidth/N. The real time for this will be the same as transferring N files 
sequential with full bandwidth.
The underlying assumptions here are that
1. the FTP server is not the bottleneck (if it is, parallel downloads can even 
be slower than serial ones)
2. network bandwidth *or* disk write bandwidth is the bottleneck (not CPU)
3. the files are reasonable large as often seen in science (so RTTs from FTP 
protocol communication are negligible)
4. the FTP server allows N parallel connections from the same IP

So there are only a few situations where Wget2 could improve the download time:
1. many small files to be downloaded
2. the list of FTP URLs contains more than one domain (the list could be split 
by domain and several instances of wget coukld be started in parallel)

Wget2 has several improvements to speed up transfers of files. I'd say the 
combination of HTTP/2 and compression is the biggest win over wget. (often it 
is gzip, but some servers support brotli or zstd which are much better in terms 
of compression ratios and decompression CPU usage than gzip).
Both are not available for FTP.

Slightly OT, but please consider any download via FTP (or any other non-secure 
protocol) as tainted, even when downloading within your faculty. Since no one 
ever checks the file integrity (this is tedious manual work), everybody should 
use a secure channel for downloading. Or the other way round, you upload your 
data via FTP - how do you make sure the server received the correct data ? 
(Data-internal checksums do not help against malicious intent.)

Back to the topic... I previously proposed an extra tool like `wget2-ftp` to 
keep the maintenance lower and scalable. The downside seems to be for recursive 
website downloaders who also want all the referenced FTP sites being downloaded 
in one go.
Another option is to write a plugin for the FTP protocol - it keeps the code 
separate, the maintenance for libwget/wget2 would not increase (much). And the 
FTP code could have it's own maintainer (scalable maintenance).

I happily help out any volunteer.

Reply to this email directly or view it on GitLab: 
You're receiving this email because of your account on gitlab.com.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]