[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49EE1C32.1060202@myri.com>
Date: Tue, 21 Apr 2009 15:19:14 -0400
From: Andrew Gallatin <gallatin@...i.com>
To: Herbert Xu <herbert@...dor.apana.org.au>
CC: David Miller <davem@...emloft.net>, brice@...i.com,
sgruszka@...hat.com, netdev@...r.kernel.org
Subject: Re: [PATCH] myr10ge: again fix lro_gen_skb() alignment
Herbert Xu wrote:
> On Wed, Apr 15, 2009 at 04:42:48PM -0700, David Miller wrote:
>> Herbert has been working on various optimizations to get
>> cxgb3 GRO performance on par with LRO. Perhaps he has
>> some things for you to try :-)
>
> Yes, this patch should improve performace. In fact, when you
> reopen the net-next tree feel free to put this patch in :)
>
> gro: New frags interface to avoid copying shinfo
<...>
Hi Herbert,
With a net-next tree pulled 2 hours ago, I can now see line rate when
using frags with myri10ge on my weakest machines when receiving an
1500b TCP stream. To achieve line rate on these machines with both
inet_lro and GRO, I must bind the netserver and device IRQ to
different CPUs. Unfortunately, CPU accounting seems to currently be
broken in the Linux kernel, so I cannot provide an accurate comparison
at line rate.
So to compare inet_lro and GRO, I'm binding the netserver and device IRQ
to the same CPU. When I do this, that CPU is saturated and GRO is
roughly 17% slower than inet_lro. For comparison, here are netperf
results from a fast peer sending to my weak machine (AMD Athlon(tm) 64
X2 Dual Core Processor 3800+, 2GHz). First inet_lro:
Recv Send Send Utilization Service
Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local
remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
87380 65536 65536 60.02 6631.52 12.45 50.10 0.308
1.238
And now GRO:
87380 65536 65536 60.01 5488.99 9.79 50.00 0.292
1.492
Also, can you tell me how to handle my device, which passes a simple
16-bit checksum across the entire frame (excluding first 14 bytes),
via GRO? Simply setting skb->ip_summed = CHECKSUM_COMPLETE leads
to "hw csum failure".
I've attached my work-in-progress patch so you can see what I'm doing.
I do not want this applied due to performance and correctness issues.
Thanks for your help,
Drew
View attachment "myri10ge_gro.diff" of type "text/x-diff" (9040 bytes)
Powered by blists - more mailing lists