|
From: | Bill Auerbach |
Subject: | RE: [lwip-users] How to optimize raw UDP performance |
Date: | Thu, 24 Sep 2009 14:54:16 -0400 |
There's some risk with disabling UDP checksum, but it's
low. From what I see, UDP loss is on the magnitude of packets, not bytes
in packets. In Windows, it can be bad. I've seen 30-40 contiguous dropped
packets just minimizing a window (even an application other than the UDP-based
one). OTOH, I can run 150,000 packets a second for an hour without a drop
(on a LAN as a matter of fact). Be prepared to contend with the non-lwIP
end of the connection at higher speeds. Optimize your Ethernet driver. If you can't send
UDP packets at 980+MbS, you're not optimal in your driver. Although you
don't need that speed, the faster each packet is sent the better. Note: use a
static copy of one UDP pbuf and send it repeatedly. This will help you:
The later you make your call here, the better. There
is a HUGE difference between udp_sendto and etharp_query! The speed
killer is this is the ARP lookup, the redundant address checks, a few pbuf_header
calls and small copies in these routines. I optimized etharp.c using a
faster cache test, moved these functions to onchip memory (this is big if you
can do this), and removed the SMEMCPYs and for-loop MAC copies to use more
efficient copies and then using etharp_query got over 700MbS (100MHz Cyclone
III FPGA running NIOS II – this may be close to your platform).
Compare this to udp_sendto_if which was only about 325MbS. In the end I
resorted to using my own routines to build UDP packets (one pbuf with the IP/UDP
header chained to the payload). With checksums disabled I get
969MbS. (I had a goal to get close to the wire speed if possible.) I had
to time this on the target side – Windows can only keep up with short
bursts at this speed (250 packets or less) and WireShark has some difficulties
but will also capture short bursts. I timed the times of 100 packets in
WireShark to validate my times recorded on the target. These times were
taken with nothing else going on in the system. My speeds reflect changes in several areas and do not
reflect what is possible *only* changing lwIP or optimizing the driver.
My goal was to make Ethernet communications as fast as possible without rules
of what to change and not to change. Bill >-----Original Message----- >From:
address@hidden >[mailto:address@hidden
On >Behalf Of Max Bobrov >Sent: Thursday, September 24, 2009 1:17 PM >To: Mailing list for lwIP users >Subject: Re: [lwip-users] How to optimize raw UDP
performance > >Bill: Thank you! disable CHECKSUM_CHECK_UDP and
CHECKSUM_GEN_UDP gave >a considerable increase in performance. Xilinx gui
interface for lwip >could use some significant improvement to make this
and many other >features more accessible. > >Chris: I've increased some of these values (listed
below) but haven't >seen much improvement from that. Do these look ok or
have you had >better success with others? > >#define MEM_ALIGNMENT 8 >#define MEM_SIZE 262144 >#define MEMP_NUM_PBUF 32 >#define MEMP_NUM_UDP_PCB 8 >#define MEMP_NUM_TCP_PCB 32 >#define MEMP_NUM_TCP_PCB_LISTEN 8 >#define MEMP_NUM_TCP_SEG 256 >#define LWIP_USE_HEAP_FROM_INTERRUPT 1 > >#define MEMP_NUM_SYS_TIMEOUT 8 >#define PBUF_POOL_SIZE 256 >#define PBUF_POOL_BUFSIZE 2048 >#define PBUF_LINK_HLEN 16 > > >On Wed, Sep 23, 2009 at 11:06 PM, Chris Strahm
<address@hidden> >wrote: >> Actually someone else reported to me that
turning the checksum off in >lwIP >> actually made it slower. I have not
checked the reason for this, but >that >> was someone else's experience. There is a
big difference in whether >you use >> 8/16/32 bit memcpy type routines. Also if
you can write it in asm. > Since >> yours is FPGA, little different. Also same
kind of thing for >checksum. Asm >> will be faster. Sometimes the difference
in how a particular variable >or >> address pointer is generated by C can result in
very big difference in >code. >> You have to look at everything when it comes to
high performance. >> >> Also what is the size of your PBUFs and your
blocks in your DMA or MAC >ISR. >> I assume for a 1G Enet system you probably want
the maximum, about >1536 >> each. >> >> Chris. >> >> >> >> _______________________________________________ >> lwip-users mailing list >> address@hidden >>
http://lists.nongnu.org/mailman/listinfo/lwip-users >> > > >_______________________________________________ >lwip-users mailing list >address@hidden >http://lists.nongnu.org/mailman/listinfo/lwip-users |
[Prev in Thread] | Current Thread | [Next in Thread] |