lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lwip-users] sporadic stall of outgoing TCP connection with lwip 2.2


From: Andrew Tridgell
Subject: Re: [lwip-users] sporadic stall of outgoing TCP connection with lwip 2.2.0
Date: Mon, 4 Dec 2023 07:00:12 +1100

following up my own question, Matthew Ridley has found the issue. Some of the memory being used for the frame buffers in the MACv2 driver in ChibiOS were not in non-cacheable memory, resulting in lots of TCP checksum errors.
For now I've added in invalidate and flush cache ops and the speed has gone up from a few kbyte/sec to 4.5MByte/sec. I'll also look into how we ended up with cacheable memory passed down to the MAC layer.


On Sun, 3 Dec 2023 at 13:26, Andrew Tridgell <tridge60@gmail.com> wrote:
The ArduPilot project has recently implemented lwip 2.2.0 on ChibiOS 21.11.3 on STM32H7 (specifically H743 and H757). It is generally working well (thanks!), but we have an issue with sporadic stalling of outgoing TCP connections within a LAN.
The test I'm running is a continuous write of 1k blocks to the TCP discard service hosted with xinetd on linux 6.4.6 (although I have also reproduced the issue writing to a windows 10 server).
I've put a wireshark capture here:
http://uav.tridgell.net/tmp/TCP_test_discard2.pcapng
My TCP knowledge is a bit rusty (I read Stevens a very long time ago and have forgotten far too much of it), but it looks like linux is rejecting some seemingly valid frames from lwip for some reason.
In the capture the interesting stuff happens around frame 2310 to frame 2326, with frame 2325 having a delay of 1.26 seconds.
In this capure 192.168.13.14 is lwip 2.2.0, and 192.168.13.15 is the linux 6.4.6 box.
image.png
the key relative sequence number is 1065425. The data frame with that sequence number is sent by lwip a total of 3 times, in frames 2317, 2324 and 2325. The 3-ack fast retransmission is triggered by frame 2323, and lwip does dutifully retransmit in 2324. Linux still doesn't ack it though until 1.26s later lwip transmits it again and finally linux acks in 2326 and the TCP connection gets unstuck and continues.
The wireshark capture was made on the linux box receiving the traffic, so I know the linux kernel is seeing these frames, so why doesn't it ack earlier? Is lwip perhaps violating some TCP windowing rule?
Any suggestions on how to debug this? It is bringing our TCP throughput down from 1.7MByte/s to a couple of kbytes/second.
The lwipopts.h config is here:
https://github.com/ArduPilot/ardupilot/blob/master/libraries/AP_HAL_ChibiOS/hwdef/common/lwipopts.h
thanks for any suggestions!
Cheers, Tridge


reply via email to

[Prev in Thread] Current Thread [Next in Thread]