I
don't think the ethernet DMA is an issue. We are using the Coldfire
5282
on
which we help instigate errata for the FEC (incoming) pbufs. That seems
to
have been worked around though. I'll keep looking. Thanks for your
help.
Tom
-----Original
Message----- From:
address@hidden
[mailto:address@hiddenOn Behalf Of
Jim Gibbons Sent: Friday, March 04, 2005 12:34 PM To:
Mailing list for lwIP users Subject: Re: [lwip-users] FTP-DATA
exchange: TCP issues
I was in error to suggest this
problem. At the time that I saw this problem, the folks in question were
running 0.6.3. In that version, the user was responsible for the timer,
and the usual implementation just left it running, whether needed or
not.
I can see what you mean about the use of the timer
currently. It should get launched from the tcpip thread when needed, and
that should preclude problems. Sorry about the confusion.
One
other thing that had been an issue around that time were data cache coherency
problems related to the ethernet DMA. We eventually turned off their
data cache to avoid the confusion. Any chance that you have such a
problem?
Tom C. Barker wrote:
Jim,
Not barging in at all Jim. On the contrary, thanks
for the response. I can confirm
I
am using lightweight protection and I will take a look at the timer call.
The call
to
the tcp timer is made only when the timer is _needed, though. What would be
the
significance of the initial call to sys_timeout if
there is no tcp connection\no need
for a tcp timer at startup? It would seem that a
call to the tcp timer would result in
it
firing once, finding no need to fire again and never
reschedule.
Thanks again,
Tom
Pardon me
again for barging in. Keiran's analysis, particularly regarding an
unmotivated retransmit, sounded very familiar. I had a problem like
this at one of my clients. We changed two things and it then went
away.
First, we found and fixed a problem with the tcp_tmr.
It was running in the wrong task context. It must run in the tcpip
thread. The usual method for doing this is to make the initial call
to sys_timeout from within the callback function that executes when tcpip
initialization is done.
Second, we found that we weren't using the
lightweight protection option that I mentioned to you earlier.
I
think it was actually the first thing that was causing the retransmit
problem, but we never found out for sure. It's really difficult to
track down resource conflicts. When the problem went away, we
stopped working on it.
Tom C. Barker wrote:
Thanks for your analysis Kieran. Forgive my assessment of
what ACKs are what: I was speaking of the multiple ACKs
the client sends back. ".65", the problem node, is in fact
the lwIP ftp server.
I have all my DEBUG statements on and find that I never get
a tcp_enqueue of the missing packet. It just skips over it.
My only priority is this issue right now so if you or anyone
has any ideas of what I can watch for I open to ideas. Meanwhile
I'm crafting a bit-patterned file to help identify where the
problem is occurring.
Tom
-----Original Message-----
From: address@hidden
[mailto:address@hidden]On Behalf
Of Kieran Mansley
Sent: Friday, March 04, 2005 1:29 AM
To: Mailing list for lwIP users
Subject: Re: [lwip-users] FTP-DATA exchange: TCP issues
On Thu, 2005-03-03 at 09:54 -0800, Tom C. Barker wrote:
Hello,
Maybe to short-circuit this issue, I am working with
0.7.2 and am in the process of moving to 1.1.0 so if
the following problem resembles a bug prior to 1.1.0,
please let me know.
In testing an ftp implementation where I will occasionally
successfully transfer a 400k file, I have come across a
consistently reproducible issue where my lwIP ftp server
seems to have dropped an ACK in that according to the
attached (truncated-packets) ethereal file, the packet on
line 249 should have ACK'd 264364, but instead ACKs 267284.
The rest of the (doomed) transaction is spent trying to
shoehorn in a few packets to the client's unacked queue.
Your description doesn't seem to match the trace that you've attached.
There is no packet there that ACKs 267284.
However, there is clearly something going wrong in that data transfer.
The problem seems to me to start with packet 245, which (i) is a
retransmission (of packet 242) when none seems necessary and (ii)
doesn't have the same payload as the earlier transmission of the same
data. Looks to me like packet 245 has got the wrong sequence number on
it, and it is in fact the payload of the next in-order packet.
Something similar happens with packet 244 and 247: 247 is a
retransmission of 244, but would not seem to be necessary, and this time
they both have the same payload.
What's more worrying is that the ".65" node then fails to retransmit the
correct data when it should: it gets many duplicate acknowledgements for
264364, which should lead it to retransmit that packet, but it refuses.
I can't explain this is in full, but hopefully that will give you some
clues about what might be wrong. You could compare the captured
payloads against the file that is being transferred to check my theory
about 245 having the wrong sequence number.
Kieran
_______________________________________________
lwip-users mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/lwip-users
_______________________________________________
lwip-users mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/lwip-users
--
Jim Gibbons
|
address@hidden
|
Gibbons and Associates, Inc.
|
TEL: (408) 984-1441
|
900 Lafayette, Suite 704, Santa Clara, CA
|
FAX: (408)
247-6395
|
_______________________________________________
lwip-users mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/lwip-users
--
Jim Gibbons
|
address@hidden
|
Gibbons and Associates, Inc.
|
TEL: (408) 984-1441
|
900 Lafayette, Suite 704, Santa Clara, CA
|
FAX: (408)
247-6395
|
|