lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lwip-users] LWIP - TCP receive assert failed


From: Jackie
Subject: Re: [lwip-users] LWIP - TCP receive assert failed
Date: Thu, 22 Jan 2015 22:36:17 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.4.0

Hi Simon & Sylvain,

After re-examining the code, finally I got more understanding of what can cause this problem. Instead of more than one threads are calling tcp_output(), there is only one thread tcpip_thread(), but in which tcp_output() is called recursively.

The case happens when the lower-layer protocol (PPP) uses sys_sem_wait(). In this function, it is not only waiting a semaphore, but it also gives the timer chances to run. As long as it times out, the tcp timer will be called, and in the timer, tcp_output() will be called again, like,

tcpip_thread()
{
    ...
    tcp_input()
    {
        ...
        tcp_output()
        {
            ...
            pppifOutput()
            {
                ...
                sys_sem_wait();  // Here the tcp_slowtmr() has a chance to run again and tcp_output() may be called again in it.
                some_ppp_write_func();
                ...
            }
            ...
        }
        ...
    }
    ...
}

@Sylvain,
You are right, this PPP protocol implementation is from a third party, so that I am not allowed to modify it so much. But I think this design is quite buggy, and the worst case is tcp_output() can be called recursively several times.

So I think in my case, there is no issue related to original LWIP design.

Best,
Jackie



On 01/17/15 04:47, address@hidden wrote:
Jackie:
After stress test and debugging, more than 10 hours uploading data, I found the PCB got corrupt in tcp_output(). The case is that tcp_output() can be blocked by the lower-level function call in tcp_output_segment(), in which somehow the buffer of lower-layer protocol is full, so the upper-layer is pending, and at the same time, tcp timer is running,  tcp_slowtmr() is also calling tcp_output(), so this tcp_output() is called before the

There you got the bug: when lwIP's threading requirements are observed, this can't happen: tcp_output() can never be called twice and thus does not have to be designed reentrant.

What you describe tells us that timers are checked from a different execution thread (thread or ISR) than output. But for the core lwIP code, you have to ensure this doesn't happen. That's all.

Of course this raises the problem of what to do with TX packets when e.g. your DMA queue is full. Usually it's best to add a 2nd (larger) software-queue that fills the DMA queue and to keep an upper limit on it. You'd then return ERR_IF when this limit is reached.

Simon


_______________________________________________
lwip-users mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/lwip-users


reply via email to

[Prev in Thread] Current Thread [Next in Thread]