[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lwip-users] 1.4 rc1 non-blocking issues
From: |
Simon Goldschmidt |
Subject: |
Re: [lwip-users] 1.4 rc1 non-blocking issues |
Date: |
Thu, 02 Dec 2010 08:14:46 +0100 |
Kieran Mansley <address@hidden> wrote:
> > 1. ERR_WOULDBLOCK is treated as a FATAL error - it seems as if someone
> > forgot to update the ERR_IS_FATAL macro when the error code was added. A
> > non-blocking operation that sets the conn error to WOULDBLOCK (e.g
> > send() and recv() ) renders the socket unusable. Our workaround was to
> > use ERR_WOULDBLOCK in the ERR_IS_FATAL macro instead of ERR_VAL.
>
> I agree that looks wrong. I'm sure it was correct in the past. I'll make
> sure this is fixed before 1.4.0 is released. If you could file a bug for
> this on savannah it would help.
I've just fixed that in CVS, thanks for reporting. (I can still not add a new
bug to savannah...)
> > 2. As far as we know, EMSGSIZE is not a valid return code for send() on
> > a STREAM socket. netconn_write does not return the number of bytes
> > processed and cannot perform partial sends. This makes an application
> > that uses select run in tight loops since select returns writable, but
> > send [working on an all or nothing assumption] returns an error
> > (EWOULDBLOCK)
>
> I'd like to see lwIP updated to support partial sends (which I think would
> solve this problem) but it won't happen before 1.4.0. A task on savannah
> would again be very helpful, if there's not one already - I think this
> might have been discussed before.
There's bug #31084 for that on savannah: https://savannah.nongnu.org/bugs/?31084
I've targeted it at 1.4.0 though. I think it's one of the main missing things
for lwIP to support nonblocking sockets, so it might be worth including that to
prevent having to launch 1.4.1 too soon...
> > 3. connect has several problems:
> >
> > a. connect sets sock->err to EINPROGRESS. When select returns
> > writable, getsockopt(SO_ERROR) will never let us know what happened [i.e
> > no access to conn->err] since getsockopt(SO_ERROR) does not return the
> > error value when sock->err is not 0 (it is set to EINPROGRESS). It seems
> > to me the non-blocking path lacks the propagation of the connect result
> > to sock->err (which does happen when using a blocking call).
Could you try CVS HEAD? There was a bug there
(http://savannah.nongnu.org/bugs/?31590) which should be fixed since a week or
so now.
> > b. getsockopt(SO_ERROR) - behaviour according to Posix is to return
> > and clear the _pending_ error for the socket (if one exists). instead
> > getsockopt returns the last socket call error once. If additional calls
> > are made netconn's last error is returned repeatedly.
That's a problem of the mixture of 3 APIs here: socket API is built on top of
the netconn API which in turn is built on top of the raw API. In a fatal error
state, the socket could have an error that cannot easily be removed, so not
returning that first could be a problem. This code needs further thoughts, so
it would be best to open a bug entry for discussion (as soon as savannah is
fully back online again).
> > c. if connect is called again while a previous non-blocking connect
> > is being processed, ERR_ISCONN is assigned to conn->err [which by the
> > way translates to an errno of -1]. Now, if the connection succeeds,
> > do_connected will not be able to set conn->err to ERR_OK since it checks
> > for ERR_INPROGRESS. To make things worse, ERR_ISCONN is treated as a
> > FATAL error, and will therefore render the socket unusable. According to
> > Posix, EALREADY should be returned while a connect is in progress, and
> > EISCONN should be returned when a socket is connected.
That's a good one. Again, please file a bug report to make sure this doesn't
get lost!
> > 4. lwip_select seems to be susceptible to race conditions and has issued
> > many ASSERTs as well as crashed.
>
> These sounds like you're using a socket from two different threads (one
> calling select(), the other calling close()). Unfortunately this style of
> operation isn't supported in lwIP.
Hmm, select() does use a global list to queue all threads waiting in a select
(select_cb_list), but for me, that worked quite well, so any detailed bug
report would be great.
Thanks for the feedback, and please remember to open bug reports for the open
issues: there are too many posts on this list to keep track of bugs without the
bugtracker.
Simon
Simon
--
GMX DSL Doppel-Flat ab 19,99 €/mtl.! Jetzt auch mit
gratis Notebook-Flat! http://portal.gmx.net/de/go/dsl