lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 21 Apr 2009 15:19:14 -0400
From:	Andrew Gallatin <gallatin@...i.com>
To:	Herbert Xu <herbert@...dor.apana.org.au>
CC:	David Miller <davem@...emloft.net>, brice@...i.com,
	sgruszka@...hat.com, netdev@...r.kernel.org
Subject: Re: [PATCH] myr10ge: again fix lro_gen_skb() alignment

Herbert Xu wrote:
 > On Wed, Apr 15, 2009 at 04:42:48PM -0700, David Miller wrote:
 >> Herbert has been working on various optimizations to get
 >> cxgb3 GRO performance on par with LRO.  Perhaps he has
 >> some things for you to try :-)
 >
 > Yes, this patch should improve performace.  In fact, when you
 > reopen the net-next tree feel free to put this patch in :)
 >
 > gro: New frags interface to avoid copying shinfo
<...>

Hi Herbert,

With a net-next tree pulled 2 hours ago, I can now see line rate when
using frags with myri10ge on my weakest machines when receiving an
1500b TCP stream.  To achieve line rate on these machines with both
inet_lro and GRO, I must bind the netserver and device IRQ to
different CPUs.  Unfortunately, CPU accounting seems to currently be
broken in the Linux kernel, so I cannot provide an accurate comparison
at line rate.

So to compare inet_lro and GRO, I'm binding the netserver and device IRQ
to the same CPU.  When I do this, that CPU is saturated and GRO is
roughly 17% slower than inet_lro.  For comparison, here are netperf
results from a fast peer sending to my weak machine (AMD Athlon(tm) 64
X2 Dual Core Processor 3800+, 2GHz).  First inet_lro:

Recv   Send    Send                          Utilization       Service 
Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local 
remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

  87380  65536  65536    60.02      6631.52   12.45    50.10    0.308 
1.238

And now GRO:
  87380  65536  65536    60.01      5488.99   9.79     50.00    0.292 
1.492

Also, can you tell me how to handle my device, which passes a simple
16-bit checksum across the entire frame (excluding first 14 bytes),
via GRO?  Simply setting skb->ip_summed = CHECKSUM_COMPLETE leads
to  "hw csum failure".

I've attached my work-in-progress patch so you can see what I'm doing.
I do not want this applied due to performance and correctness issues.

Thanks for your help,

Drew

View attachment "myri10ge_gro.diff" of type "text/x-diff" (9040 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ