[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49F861BF.7060403@myri.com>
Date: Wed, 29 Apr 2009 10:18:39 -0400
From: Andrew Gallatin <gallatin@...i.com>
To: Eric Dumazet <dada1@...mosbay.com>
CC: Herbert Xu <herbert@...dor.apana.org.au>,
David Miller <davem@...emloft.net>, brice@...i.com,
sgruszka@...hat.com, netdev@...r.kernel.org
Subject: Re: [PATCH] myr10ge: again fix lro_gen_skb() alignment
Eric Dumazet wrote:
> Andrew Gallatin a écrit :
>> Andrew Gallatin wrote:
>>> For variety, I grabbed a different "slow" receiver. This is another
>>> 2 CPU machine, but a dual-socket single-core opteron (Tyan S2895)
>>>
>>> processor : 0
>>> vendor_id : AuthenticAMD
>>> cpu family : 15
>>> model : 37
>>> model name : AMD Opteron(tm) Processor 252
>> <...>
>>> The sender was an identical machine running an ancient RHEL4 kernel
>>> (2.6.9-42.ELsmp) and our downloadable (backported) driver.
>>> (http://www.myri.com/ftp/pub/Myri10GE/myri10ge-linux.1.4.4.tgz)
>>> I disabled LRO, on the sender.
>>>
>>> Binding the IRQ to CPU0, and the netserver to CPU1 I see 8.1Gb/s with
>>> LRO and 8.0Gb/s with GRO.
>> With the recent patch to fix idle CPU time accounting from LKML applied,
>> it is again possible to trust netperf's service demand (based on %CPU).
>> So here is raw netperf output for LRO and GRO, bound as above.
>>
>> TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
>> hail1-m.sw.myri.com (10.0.130.167) port 0 AF_INET : cpu bind
>> Recv Send Send Utilization Service
>> Demand
>> Socket Socket Message Elapsed Send Recv Send
Recv
>> Size Size Size Time Throughput local remote local
remote
>> bytes bytes bytes secs. 10^6bits/s % S % S us/KB
>> us/KB
>>
>> LRO:
>> 87380 65536 65536 60.00 8279.36 8.10 77.55 0.160
1.535
>> GRO:
>> 87380 65536 65536 60.00 8053.19 7.86 85.47 0.160
1.739
>>
>> The difference is bigger if you disable TCP timestamps (and thus shrink
>> the packets headers down so they require fewer cachelines):
>> LRO:
>> 87380 65536 65536 60.02 7753.55 8.01 74.06 0.169
1.565
>> GRO:
>> 87380 65536 65536 60.02 7535.12 7.27 84.57 0.158
1.839
>>
>>
>> As you can see, even though the raw bandwidth is very close, the
>> service demand makes it clear that GRO is more expensive
>> than LRO. I just wish I understood why.
>>
>
> What are "vmstat 1" ouputs on both tests ? Any difference on say...
context switches ?
Not much difference is apparent from vmstat, except for a
lower load and slightly higher IRQ rate from LRO:
LRO:
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
r b swpd free buff cache si so bi bo in cs us sy
id wa st
1 0 0 676960 19280 209812 0 0 0 0 14817 24 0
73 27 0 0
1 0 0 677084 19280 209812 0 0 0 0 14834 20 0
73 27 0 0
1 0 0 676916 19280 209812 0 0 0 0 14833 16 0
74 26 0 0
GRO:
r b swpd free buff cache si so bi bo in cs us sy
id wa st
1 0 0 678244 18008 209784 0 0 0 24 14288 32 0
84 16 0 0
1 0 0 678268 18008 209788 0 0 0 0 14403 22 0
85 15 0 0
1 0 0 677956 18008 209788 0 0 0 0 14331 20 0
84 16 0 0
The real difference is visible mainly from mpstat on the CPU handing the
interrupts where you see softirq is much higher:
LRO:
07:15:16 CPU %user %nice %sys %iowait %irq %soft %steal
%idle intr/s
07:15:17 0 0.00 0.00 0.00 0.00 0.00 45.00 0.00
55.00 12907.92
07:15:18 0 0.00 0.00 1.00 0.00 2.00 43.00 0.00
54.00 12707.92
07:15:19 0 0.00 0.00 1.00 0.00 0.00 46.00 0.00
53.00 12825.00
GRO
07:11:59 CPU %user %nice %sys %iowait %irq %soft %steal
%idle intr/s
07:12:00 0 0.00 0.00 0.00 0.00 0.99 66.34 0.00
32.67 12242.57
07:12:01 0 0.00 0.00 0.00 0.00 1.01 66.67 0.00
32.32 12220.00
07:12:02 0 0.00 0.00 0.99 0.00 0.99 65.35 0.00
32.67 12336.00
So it is like "something" GRO is doing in the softirq context is more
expensive than what LRO is doing.
Drew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists