netdev - Re: [PATCH] myr10ge: again fix lro_gen

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <49F861BF.7060403@myri.com>
Date:	Wed, 29 Apr 2009 10:18:39 -0400
From:	Andrew Gallatin <gallatin@...i.com>
To:	Eric Dumazet <dada1@...mosbay.com>
CC:	Herbert Xu <herbert@...dor.apana.org.au>,
	David Miller <davem@...emloft.net>, brice@...i.com,
	sgruszka@...hat.com, netdev@...r.kernel.org
Subject: Re: [PATCH] myr10ge: again fix lro_gen_skb() alignment

Eric Dumazet wrote:
 > Andrew Gallatin a écrit :
 >> Andrew Gallatin wrote:
 >>> For variety, I grabbed a different "slow" receiver.  This is another
 >>> 2 CPU machine, but a dual-socket single-core opteron (Tyan S2895)
 >>>
 >>> processor       : 0
 >>> vendor_id       : AuthenticAMD
 >>> cpu family      : 15
 >>> model           : 37
 >>> model name      : AMD Opteron(tm) Processor 252
 >> <...>
 >>> The sender was an identical machine running an ancient RHEL4 kernel
 >>> (2.6.9-42.ELsmp) and our downloadable (backported) driver.
 >>> (http://www.myri.com/ftp/pub/Myri10GE/myri10ge-linux.1.4.4.tgz)
 >>> I disabled LRO, on the sender.
 >>>
 >>> Binding the IRQ to CPU0, and the netserver to CPU1 I see 8.1Gb/s with
 >>> LRO and 8.0Gb/s with GRO.
 >> With the recent patch to fix idle CPU time accounting from LKML applied,
 >> it is again possible to trust netperf's service demand (based on %CPU).
 >> So here is raw netperf output for LRO and GRO, bound as above.
 >>
 >> TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
 >> hail1-m.sw.myri.com (10.0.130.167) port 0 AF_INET : cpu bind
 >> Recv   Send    Send                          Utilization       Service
 >> Demand
 >> Socket Socket  Message  Elapsed              Send     Recv     Send 
    Recv
 >> Size   Size    Size     Time     Throughput  local    remote   local 
remote
 >> bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB
 >> us/KB
 >>
 >> LRO:
 >>  87380  65536  65536    60.00      8279.36   8.10     77.55    0.160 
1.535
 >> GRO:
 >>  87380  65536  65536    60.00      8053.19   7.86     85.47    0.160 
1.739
 >>
 >> The difference is bigger if you disable TCP timestamps (and thus shrink
 >> the packets headers down so they require fewer cachelines):
 >> LRO:
 >>  87380  65536  65536    60.02      7753.55   8.01     74.06    0.169 
1.565
 >> GRO:
 >>  87380  65536  65536    60.02      7535.12   7.27     84.57    0.158 
1.839
 >>
 >>
 >> As you can see, even though the raw bandwidth is very close, the
 >> service demand makes it clear that GRO is more expensive
 >> than LRO.  I just wish I understood why.
 >>
 >
 > What are "vmstat 1" ouputs on both tests ? Any difference on say... 
context switches ?

Not much difference is apparent from vmstat, except for a
lower load and slightly higher IRQ rate from LRO:

LRO:
procs -----------memory---------- ---swap-- -----io---- --system-- 
-----cpu------
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy 
id wa st
  1  0      0 676960  19280 209812    0    0     0     0 14817   24  0 
73 27  0  0
  1  0      0 677084  19280 209812    0    0     0     0 14834   20  0 
73 27  0  0
  1  0      0 676916  19280 209812    0    0     0     0 14833   16  0 
74 26  0  0


GRO:
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy 
id wa st
  1  0      0 678244  18008 209784    0    0     0    24 14288   32  0 
84 16  0  0
  1  0      0 678268  18008 209788    0    0     0     0 14403   22  0 
85 15  0  0
  1  0      0 677956  18008 209788    0    0     0     0 14331   20  0 
84 16  0  0




The real difference is visible mainly from mpstat on the CPU handing the
interrupts where you see softirq is much higher:

LRO:
07:15:16     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal 
   %idle    intr/s
07:15:17       0    0.00    0.00    0.00    0.00    0.00   45.00    0.00 
   55.00  12907.92
07:15:18       0    0.00    0.00    1.00    0.00    2.00   43.00    0.00 
   54.00  12707.92
07:15:19       0    0.00    0.00    1.00    0.00    0.00   46.00    0.00 
   53.00  12825.00


GRO
07:11:59     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal 
   %idle    intr/s
07:12:00       0    0.00    0.00    0.00    0.00    0.99   66.34    0.00 
   32.67  12242.57
07:12:01       0    0.00    0.00    0.00    0.00    1.01   66.67    0.00 
   32.32  12220.00
07:12:02       0    0.00    0.00    0.99    0.00    0.99   65.35    0.00 
   32.67  12336.00


So it is like "something" GRO is doing in the softirq context is more
expensive than what LRO is doing.

Drew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html