grub-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] tcp: ack when we get an OOO/lost packet


From: Josef Bacik
Subject: Re: [PATCH] tcp: ack when we get an OOO/lost packet
Date: Thu, 13 Aug 2015 13:40:13 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

On 08/13/2015 01:13 PM, Andrei Borzenkov wrote:


On 13.08.2015 16:59, Josef Bacik wrote:
On 08/13/2015 04:19 AM, Andrei Borzenkov wrote:
On Wed, Aug 12, 2015 at 6:16 PM, Josef Bacik <address@hidden> wrote:
While adding tcp window scaling support I was finding that I'd get
some packet
loss or reordering when transferring from large distances and grub
would just
timeout.  This is because we weren't ack'ing when we got our OOO
packet, so the
sender didn't know it needed to retransmit anything, so eventually it
would fill
the window and stop transmitting, and we'd time out.  Fix this by
ACK'ing when
we don't find our next sequence numbered packet.  With this fix I no
longer time
out.  Thanks,

I have a feeling that your description is misleading. Patch simply
sends duplicated ACK, but partner does not know what has been received
and what has not, so it must wait for ACK timeout anyway before
retransmitting. What this patch may fix would be lost ACK packet
*from* GRUB, by increasing rate of ACK packets it sends. Do you have
packet trace for timeout case, ideally from both sides simultaneously?


The way linux works is that if you get <configurable amount> of DUP
ack's it triggers a retransmit.

Do you have pointers to documentation and code?

The tcp_reordering systctl allows you to set how many DUP acks you get before retransmitting, you can see the comment above the function tcp_time_to_recover in the kernel. With no SACK support we rely on getting a certain number of DUP ACKs before retransmitting, as we could get the out of order packets we want in time and not have to retransmit.


                               I only have traces from the server
since tcpdump doesn't work in grub (or if it does I don't know how to do
it).

GRUB does not have tcpdump, but your switch quite likely has port
mirroring.


Big comapny, big datacenters etc, etc. I'm a file system developer, you are lucky I know how to spell tcpdump to begin with ;). The tcpdump on the server side supports my hypothesis, we send lots and lots of stuff, the grub box starts falling behind in it's ACK responses because it's waiting for the next SEQ packet to come in, it ACK's when it does finally come in with the new next expected SEQ, and this degrades to the point where the sender has maxed out its send window and the grub box either has lost or has yet to receive the next packet it is waiting for and times out. I can say for sure that we aren't getting the next packet we are looking for while getting a bunch of others just from my instrumentation on the grub side, I _can't_ say for sure if it is just simple re-ordering or packet loss somewhere. With this patch we're definitely getting all of the DUP ACK's, at least there doesn't appear to be any missing in the range (like I see DUP ACK #1-#300 all in a row, not missing anybody.)

If you want I can change the commit log to say something like

"If we get an out of order packet we still need to ACK with the expected SEQ number so the sender knows we haven't received that packet yet and may need a retransmission."

To clear up any ambiguity.  Thanks,

Josef



reply via email to

[Prev in Thread] Current Thread [Next in Thread]