lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lwip-users] strange TCP behavior, connection stalls


From: M.H. ten Berge
Subject: [lwip-users] strange TCP behavior, connection stalls
Date: Sat, 9 Apr 2016 17:50:10 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.6.0

Hi all,

I'm encountering a strange situation when I try to download a file (ca 40 kB) from my local HTTP server. The first kB's go well, but then the transfer stalls. It picks up a few times, which get few more kB downloaded, but then stalls again. The stalls become longer and longer, until the webserver times out and aborts the transfer. After the servers times out, lwip keeps on trying (which is also odd) and gets stuck in lwip_read(). The behavior is reproducible every time, the download has not succeeded even once. I assume that the TCP protocol in LWIP has been tested in practice numerous times, so I must be doing something wrong... I doubted if I should report this as a bug against esp-open-rtos, but because there is much more LWIP knowledge on this list, I'll try this first ;-)
Sorry for the long mail.

Setup and versions:
-dhcp/dns/router at 192.168.101.1
- webserver: a local Debian stable box, running Apache 2.4.10. Its external IP is 217.19.31.195, internally it is at 192.168.102.22. The router takes care of this, other devices such as my phone can access it correctly. - hardware: esp8266 (esp-03 module) at 192.168.101.237. It is about 1.5 meters away from the nearest Wifi access point. This AP is on channel 1, the ESP also reports that it connects on channel 1. I checked with Wifi Analyzer (Android app) that channel 1 is relatively free (the next AP is 20-25dB weaker). Other wireless devices can communicate absolutely fine via this access point. - the esp8266 is running esp-open-rtos (master branch from https://github.com/Superhouse/esp-open-rtos). Two weeks ago, when I first encountered the problem, I was using the version from March 22nd. Today I tried again with HEAD from today: https://github.com/SuperHouse/esp-open-rtos/commit/83c5f91bc09168c584be9d62966c069cdfcfa2d9. - LWIP is integrated in esp-open-rtos (as a git submodule). It uses the version from https://github.com/SuperHouse/esp-lwip/tree/3cf8d514bd76e6ef77e6fa514d0ec6d96da7fd9a According to the description on github, this is LWIP 1.4.1, with some modifications to get it running on the esp8266 (mainly the low-level network driver).
- local modifications (by me):
+ #define LWIP_POSIX_SOCKETS_IO_NAMES 0 (because the serial port functions are also called read/write, so they were colliding)
  + #define MEMP_OVERFLOW_CHECK  1
  + #define MEMP_SANITY_CHECK  1
  + #define LWIP_STATS  1
  + #define LWIP_DEBUG
+ enabled some debug categories (couldn't enable them all due to size constraints) + tried some different values for TCP_MSS, TCP_MAXRTX, TCP_SYNMAXRTX, etc. The packet capture and log were made with TCP_MSS=536, TCP_MAXRTX=12 and TCP_SYNMAXRTX=6.

What works:
- associate with wifi network
- get IP via DHCP
- DNS lookups
- HTTP GET request for a small PHP-script, which returns about 100 bytes of JSON.
- receiving this JSON reply and closing the socket
- I can also ping the esp8266. Ping times are high (>90 msec), but I assume this is caused by the enabled LWIP_DEBUG options, which have to be pushed through a serial console at 115200 bps.

What does not work:
- downloading a 40 kB static file from the same http server. The file is to be relayed to a serial port, but for now I commented that out, the data is simply discarded.

I've captured the console/debug messages, see lwiplog.txt.
Furthermore, I have run tcpdump on the webserver (this means it does not contain the dhcp or dns traffic). Because more http-requests were handled at the time, I have filtered out everything unrelated to 192.168.101.237.

What happens in the log:
- Everything starts fine (associate with wifi, get an IP, etc).
- Two odd things in the log/pcap so far:
- the log contains two occurrences of 'DNS lookup found 44.131.255.63'. I have no idea what this IP is, or why it should have been looked up. - the first socket connection to the http server does not work (packets 1-7 in the pcap). The next try (packet 8) succeeds almost instantly. - The HTTP GET request is done on line 374-392 (my code has some extra printf's, which includes the http request contents). - The user program parses the http reply header byte-by-byte (quite uggly code, but it works). These are the 'API messages' in line 506-764 (repeatedly calling lwip_read to request 1 byte).
- The actual content is downloaded using a loop:
  1. printf('r\n')
2. call lwip_read() with a buffer of 512 bytes (also tried 16, this made no difference)
  3. printf('R\n')
  4. for each received byte, printf a dot
  5. go back to step 1
- the first read returns at line 783. The data is consumed in lines 784-844. lwip_read is immediately called again, and returns immediately (lines 844-846). - this continues. However, new calls to lwip_read are taking longer and longer. For example: the function is called at line 1573, and does not return until line 1689. - Finally, the web server gives up and closes the connection. LWIP does not return from the lwip_read function anymore. I already stopped the logs before this happened, but if anyone is interested I could make new longer logs.

I'll try to attach the console log (lwiplog.txt) and the packet capture (download_try005_filter.pcap). If that doesn't work, please find them here:
log: http://famtenberge.nl/dl/?t=2a7f5b49846ac565d131abed825528af
pcap: http://famtenberge.nl/dl/?t=0b65ff76dfd827225b3c238260af56ef

Does this problem sound familiar to anyone? What is going wrong here? Any help would be very appreciated. Thanks in advance!

Kind regards,

Matthijs

Attachment: lwiplog.txt
Description: Text document

Attachment: download_try005_filter.pcap
Description: application/vnd.tcpdump.pcap


reply via email to

[Prev in Thread] Current Thread [Next in Thread]