|
From: | address@hidden |
Subject: | Re: [lwip-users] tcp_enqueue problem, using socket: |
Date: | Wed, 19 Mar 2008 19:05:37 +0100 |
User-agent: | Thunderbird 2.0.0.12 (Macintosh/20080213) |
Jonathan Larmour wrote:
The comment could be better positioned. The bit: /* if ERR_MEM, we wait for sent_tcp or poll_tcp to be called applies to the previous block. The bit: on other errors we don't try writing any more */ applies to the block the comment is presently in.
That would be my fault. The comment is indeed a little confusing!
If tcp_enqueue returns ERR_MEM, the present code is correct - it is not a fatal error, it just means that it should be retried and the . The netconn layer will do this as a result of its sent_tcp() and poll_tcp() callbacks. The calling thread will only be woken up when the data really is sent, or there _is_ a fatal error.
Everything I read in this subject seems perfectly fine to me: - data gets queued up, the remote host doesn't ACK fast enough- at one point, the application thread using the socket API gets blocked while the tcpip thread _isn't_ blocked but processes the tcp pcb (s)
As already noticed, there are 2 settings that limit the date enqueued on one PCB: the sendbuf (in bytes) and the queuelen (the number of pbufs being queued - unsent and unacked - for one tcp pcb).
In my opinion, what Piero sees is intended behaviour of lwIP: the queuelen reaches the predefined limit. Note that this was an u8_t but is now (1.3.0) an u16_t so by setting it to 0xffff, you can effectively disable this check if you want.
The fact that one limit is tested in api_msg.c (check sendbuf before calling tcp_write) and the other is checked in tcp_out.c is due to the nature of the limits: if the sendbuf can't accept all the data, we can send less data, but if the queuelen has reached the limit there is nothing that can be done in api_msg.c so no need to add extra code for it!
About the blocking of lwIP: as Jonathan already said, socket implementations DO block by default (in situations as described, for example). Most of them can be told to not block (using O_NONBLOCK or something). However, lwIP does not fully support this at the moment.
The fact that the problem disappears when disabeling the nagle algorithm could mean the nagle algorithm has a bug, indeed. But to get to the source of this, a detailed analysis of the packet flow (using ethereal/wireshark) as well as a log output of the target runing lwIP would be useful!
Simon
[Prev in Thread] | Current Thread | [Next in Thread] |