Hi Hubert!
Congrats on your selection. I look forward to a great summer of code
in Wget this time around.
On 04/29, Hubert Tarasiuk wrote:
Hello developers,
My proposal for *Speed up Wget's Download Mechanism* has been accepted
by the mentors!
There are two tasks to be done there:
- conditional GET requests (if-modified-since) (RFC7232)
- TCP Fast Open (RFC7413)
A summarized version of my proposal is available:
http://pliki.h.trsk.org/gsoc/wget_public.pdf
IMHO it is quite obvious how the first feature should be implemented in
Wget. However, there is some more moving around needed to use TFO. I
have proposed two possible ways in the above PDF. Perhaps you can
express your opinion about the approaches, or you have another idea for
accomplishing it?
There's two separate points I want to make here:
1. With respect to the changes in the Wget source, I think it is saner
to merge the connect methods. Just ensure that we can handle proxies
and FTP connections without any code duplication. I don't think
there should be anything special when making a HTTPS connection?
2. Regarding the socket options, we should spend some more time
evaluating our options. My understanding of TCP_CORK is that it may
be a useful option for Servers, but it doesn't really affect TCP
clients in any useful way. This is because TCP_CORK modifies the
minimum TCP packet size by buffering for as much data before sending
it out. With the small request sizes that a HTTP client would
generally send, I think it is better to follow Nagle's algorithm,
since TCP_CORK will not afford us any noticeable advantage. On the
other hand, it's non-portability will be a nightmare for us when
trying to support OSX, BSD and Windows.
Another issue I am thinking about is how to test the TFO feature. I am
not very familiar with network API in Python, but my first idea would be
to count the TCP segments sent and received and/or to check that the
first packet (with SYN flag) contains data (the request). What do you think?
I haven't gone through this code thoroughly yet, but they tried to
reproduce the results of the original TFO whitepaper using a Python
HTTP Server, like the one we use for our test suite. Maybe we can
borrow some code from them?