lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lwip-users] lwip lock


From: Tazzari Davide
Subject: [lwip-users] lwip lock
Date: Tue, 22 Mar 2011 16:41:41 +0100

Hi to all.

I have created an avr32 application based on FreeRtos and LWIP 1.3.2

My application is very huge but I want to concentrate on my problems.

There are 2 tasks that use http connection: a web server and a web client versus an external portal.

The application simply collects some data and, periodically, POST them to an apache based web portal.

The web server is of course alive only when a browser wants to connect otherwise is almost frozen in a listen status.

Here is my problem.

Sometime and somehow all the tcp connections are locked and lost: the web server is no more accessible and the application cannot communicate to the portal.

This seems to happen while I try to access to the web server and, in the same time, the device tries to access to the portal.

 

I started to analyze the lwip and here is what I found.

 

In file mem.c I added the following code

 

 

static u8_t *ram;

/** the last entry, always unused! */

static struct mem *ram_end;

/** pointer to the lowest free block, this is used for faster search */

static struct mem *lfree;

 

u8_t ** ppMemRam;           // DT:2011/03/09

/** the last entry, always unused! */

struct mem ** ppMemRamEnd;  // DT:2011/03/09

/** pointer to the lowest free block, this is used for faster search */

struct mem ** ppMemLFree;   // DT:2011/03/09

 

...

 

void

mem_init(void)

{

...

  ppMemRam = & ram;           // DT:2011/03/09

  ppMemRamEnd = & ram_end;  // DT:2011/03/09

  ppMemLFree = & lfree;   // DT:2011/03/09

}

 

 

This permits to me to see (through a serial debugger) the status of the heap area for the lwip data.

When problems happen, the "lfree" pointer is stacked at an address different to "ram"

I tried to look ad the mem ram area and I found that the chain of the various allocation was ok.

It seems that there was something not freed for some (for me) unknown reason.

Sometimes this is not critical because the access is ok but the wasted area grows up little by little saturating the area and locking the communication.

 

I suppose this is not a cause but an effect so a continue my analysis.

I concentrate on the memp area

I study it being sure I didn't understand so much but, anyway, here is what I discovered.

 

I show only the TCP_SEG area that seems relevant to me.

 

HEX       Offset     Delta    Block  Arg       RefCh    RefMem Free

1E08      2564       20      0      TCP_SEG    0        

1E1C      2584       20      1      TCP_SEG    1E08     

1E30      2604       20      2      TCP_SEG    1E1C     

1E44      2624       20      3      TCP_SEG    1E30     

1E58      2644       20      4      TCP_SEG    1E44     

1E6C      2664       20      5      TCP_SEG    1E58     

1E80      2684       20      6      TCP_SEG    1E6C     

1E94      2704       20     7      TCP_SEG    ?       1EE4  

1EA8      2724       20      8      TCP_SEG    ?       0

1EBC      2744       20      9      TCP_SEG    1E80    -       xxx

1ED0      2764       20      10      TCP_SEG    ?       0

1EE4      2784       20      11      TCP_SEG    ?       0

 

I try to describe...

HEX is the absolute address in memory of the memp block

Offset is the absolute offset in byte from the top of the whole memp structure

Delta is the sizeof the single block

Block is the index of the block

RefCh is the address of the "next" block chained

RefMem is the address of the "next" block found surfing the memory

Free is the first free block

 

What seems is that the block 9 is the first free. The next one is the 6th, then 5th, 4th, 3, 2, 1, 0

Reading the memory I have seen that there is the block 7 chained to block 11. These two blocks are chained but no more reachable.

Again block 10 and 8 seems to be no more reachable and chained to nothing.

 

What I see is that these two phenomena are related: when I loose mem area I lose TCP_SEG blocks as well

If we take a look at the tcp_seg structure

 

struct tcp_seg {

  struct tcp_seg *next;    /* used when putting segements on a queue */

  struct pbuf *p;          /* buffer containing data + TCP header */

...

 

 

we can see that there is a reference to pbuf. The lost tcp_seg blocks do refers to that lost mem area!

 

Anyone has ever seen such a problem?

Any suggestion on how to solve it?

 

I read also the stats of the lwip memp

 

lwip_stats.memp[i].max

lwip_stats.memp[i].avail

lwip_stats.memp[i].used

 

and what I found is, for TCP_SEG, even 12, 12, 12 so all memp block used!

 

I have one idea but I don't know if this maybe can create worst problems. This is not a solution because I don't know the real problem but it is a sort of sanity of the TCP_SEG blocks.

Looking at the example above posted I can chain the two lost blocks (10 and 8 ) to the top of the list and the chained blocks (7 and 11) to the bottom of the list. In this way I can recover at least the lost blocks. The chained blocks (7 and 11), in theory, can be still used and freed or, at least, I don't know if they are really used or lost.

 

So, the result should be

7(chained) -> 11(lost) -> 9 (free) -> 6 -> 5 -> 4 -> 3 -> 2 -> 1 -> 0 -> 8 (lost)-> 10(lost)

This, of course, must be done by hand.

For block 8 and 10 I suppose I have to call also the mem_free function on the block->p area.

 

Is it a good idea?

Again, does anybody know the problem or what the hell I have done to create this problem?

 

Another problem. I don't know if it is related; maybe it is the same problem but with a different effects.

The tcp_thread stalls!

 

static void

tcpip_thread(void *arg)

{

...

  while (1) {                          /* MAIN Loop */

    gusTcpThread ++;  // DT 03/03/2011 Debug

    gucStatusTCPIP = 0; //DT 2011/03/04 TEST

    sys_mbox_fetch(mbox, (void *)&msg);

    gucStatusTCPIP = 1; //DT 2011/03/04 TEST

    switch (msg->type) {

#if LWIP_NETCONN

    case TCPIP_MSG_API:

      LWIP_DEBUGF(TCPIP_DEBUG, ("tcpip_thread: API message %p\n", (void *)msg));

      gucStatusTCPIP = 2; //DT 2011/03/04 TEST

      msg->msg.apimsg->function(&(msg->msg.apimsg->msg));

      gucStatusTCPIP = 3; //DT 2011/03/04 TEST

      break;

#endif /* LWIP_NETCONN */

...

}

 

What I see is that the gusTcpThread counter is stopped. In this case the debug variable gucStatusTCPIP is 2 so that it stalls in the call of the api function. I don't know which one and which mbox is related to.

 

// Posts the "msg" to the mailbox. This function have to block until the "msg"

// is really posted.

void sys_mbox_post(sys_mbox_t mbox, void *msg)

{

  // NOTE: we assume mbox != SYS_MBOX_NULL; iow, we assume the calling function

  // takes care of checking the mbox validity before calling this function.

  while( pdTRUE != xQueueSend( mbox, &msg, SYS_ARCH_BLOCKING_TICKTIMEOUT ) )

  {

      vTaskDelay(10); // DT 08/03/2011 Debug

      gusCntMBoxFull++; // DT 03/03/2011 Debug

  }

  gusCntMBoxFull = 0; // DT 03/03/2011 Debug

}

 

In the normal case the variable gusCntMBoxFull is supposed to be 0. If the tcp thread is locked (the only one that can pop the queue) the queue is continuously filled till its own fullness and that while loop is an infinite loop.

 

Any idea? Do you think these two problems are the same problem with two different effects? Consider that also this problem happens in the same situation: web server and portal both on.

 

Last information. I have the optimization o1. I am going to try the optimization o0 but I have to remove pieces of code so, it is not a simple job.

 

Best regards

Davide

 

 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]