lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lwip-users] 1.4 rc1 non-blocking issues


From: Simon Goldschmidt
Subject: Re: [lwip-users] 1.4 rc1 non-blocking issues
Date: Thu, 02 Dec 2010 08:14:46 +0100

Kieran Mansley <address@hidden> wrote:
> > 1. ERR_WOULDBLOCK is treated as a FATAL error - it seems as if someone
> > forgot to update the ERR_IS_FATAL macro when the error code was added. A
> > non-blocking operation that sets the conn error to WOULDBLOCK (e.g
> > send() and recv() ) renders the socket unusable. Our workaround was to
> > use ERR_WOULDBLOCK in the ERR_IS_FATAL macro instead of ERR_VAL.
> 
> I agree that looks wrong.  I'm sure it was correct in the past.  I'll make
> sure this is fixed before 1.4.0 is released.  If you could file a bug for
> this on savannah it would help.

I've just fixed that in CVS, thanks for reporting. (I can still not add a new 
bug to savannah...)

> > 2. As far as we know, EMSGSIZE is not a valid return code for send() on
> > a STREAM socket. netconn_write does not return the number of bytes
> > processed and cannot perform partial sends. This makes an application
> > that uses select run in tight loops since select returns writable, but
> > send [working on an all or nothing assumption] returns an error
> > (EWOULDBLOCK)
> 
> I'd like to see lwIP updated to support partial sends (which I think would
> solve this problem) but it won't happen before 1.4.0.  A task on savannah
> would again be very helpful, if there's not one already - I think this
> might have been discussed before.


There's bug #31084 for that on savannah: https://savannah.nongnu.org/bugs/?31084
I've targeted it at 1.4.0 though. I think it's one of the main missing things 
for lwIP to support nonblocking sockets, so it might be worth including that to 
prevent having to launch 1.4.1 too soon...

> > 3. connect has several problems:
> > 
> >     a. connect sets sock->err to EINPROGRESS. When select returns
> > writable, getsockopt(SO_ERROR) will never let us know what happened [i.e
> > no access to conn->err] since getsockopt(SO_ERROR) does not return the
> > error value when sock->err is not 0 (it is set to EINPROGRESS). It seems
> > to me the non-blocking path lacks the propagation of the connect result
> > to sock->err (which does happen when using a blocking call).

Could you try CVS HEAD? There was a bug there 
(http://savannah.nongnu.org/bugs/?31590) which should be fixed since a week or 
so now.

> >     b. getsockopt(SO_ERROR) - behaviour according to Posix is to return
> > and clear the _pending_ error for the socket (if one exists). instead
> > getsockopt returns the last socket call error once. If additional calls
> > are made netconn's last error is returned repeatedly.

That's a problem of the mixture of 3 APIs here: socket API is built on top of 
the netconn API which in turn is built on top of the raw API. In a fatal error 
state, the socket could have an error that cannot easily be removed, so not 
returning that first could be a problem. This code needs further thoughts, so 
it would be best to open a bug entry for discussion (as soon as savannah is 
fully back online again).

> >     c. if connect is called again while a previous non-blocking connect
> > is being processed, ERR_ISCONN is assigned to conn->err [which by the
> > way translates to an errno of -1]. Now, if the connection succeeds,
> > do_connected will not be able to set conn->err to ERR_OK since it checks
> > for ERR_INPROGRESS. To make things worse, ERR_ISCONN is treated as a
> > FATAL error, and will therefore render the socket unusable. According to
> > Posix, EALREADY should be returned while a connect is in progress, and
> > EISCONN should be returned when a socket is connected.

That's a good one. Again, please file a bug report to make sure this doesn't 
get lost!

> > 4. lwip_select seems to be susceptible to race conditions and has issued
> > many ASSERTs as well as crashed.
> 
> These sounds like you're using a socket from two different threads (one
> calling select(), the other calling close()).  Unfortunately this style of
> operation isn't supported in lwIP.

Hmm, select() does use a global list to queue all threads waiting in a select 
(select_cb_list), but for me, that worked quite well, so any detailed bug 
report would be great.

Thanks for the feedback, and please remember to open bug reports for the open 
issues: there are too many posts on this list to keep track of bugs without the 
bugtracker.

Simon

Simon
-- 
GMX DSL Doppel-Flat ab 19,99 &euro;/mtl.! Jetzt auch mit 
gratis Notebook-Flat! http://portal.gmx.net/de/go/dsl



reply via email to

[Prev in Thread] Current Thread [Next in Thread]