[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4787B9E7.6040001@hp.com>
Date: Fri, 11 Jan 2008 10:48:07 -0800
From: Rick Jones <rick.jones2@...com>
To: Breno Leitao <leitao@...ux.vnet.ibm.com>
CC: Eric Dumazet <dada1@...mosbay.com>,
"Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
netdev@...r.kernel.org
Subject: Re: e1000 performance issue in 4 simultaneous links
Breno Leitao wrote:
> On Fri, 2008-01-11 at 17:48 +0100, Eric Dumazet wrote:
>
>>Breno Leitao a écrit :
>>
>>>Take a look at the interrupt table this time:
>>>
>>>io-dolphins:~/leitao # cat /proc/interrupts | grep eth[1]*[67]
>>>277: 15 1362450 13 14 13 14 15 18 XICS Level eth6
>>>278: 12 13 1348681 19 13 15 10 11 XICS Level eth7
>>>323: 11 18 17 1348426 18 11 11 13 XICS Level eth16
>>>324: 12 16 11 19 1402709 13 14 11 XICS Level eth17
>>>
>>>
>>>
>>
>>If your machine has 8 cpus, then your vmstat output shows a bottleneck :)
>>
>>(100/8 = 12.5), so I guess one of your CPU is full
>
>
> Well, if I run top while running the test, I see this load distributed
> among the CPUs, mainly those that had a NIC IRC bonded. Take a look:
>
> Tasks: 133 total, 2 running, 130 sleeping, 0 stopped, 1 zombie
> Cpu0 : 0.3%us, 19.5%sy, 0.0%ni, 73.5%id, 0.0%wa, 0.0%hi, 0.0%si, 6.6%st
> Cpu1 : 0.0%us, 0.0%sy, 0.0%ni, 75.1%id, 0.0%wa, 0.7%hi, 24.3%si, 0.0%st
> Cpu2 : 0.0%us, 0.0%sy, 0.0%ni, 73.1%id, 0.0%wa, 0.7%hi, 26.2%si, 0.0%st
> Cpu3 : 0.0%us, 0.0%sy, 0.0%ni, 76.1%id, 0.0%wa, 0.7%hi, 23.3%si, 0.0%st
> Cpu4 : 0.0%us, 0.3%sy, 0.0%ni, 70.4%id, 0.7%wa, 0.3%hi, 28.2%si, 0.0%st
> Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu6 : 0.0%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
> Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
If you have IRQ's bound to CPUs 1-4, and have four netperfs running,
given that the stack ostensibly tries to have applications run on the
same CPUs, what is running on CPU0?
Is it related to:
> The 2 interface test that I showed in my first email, was run in two
> different NIC. Also, I am running netperf with the following command
> "netperf -H <hostname> -T 0,8" while netserver is running without any
> argument at all. Also, running vmstat in parallel shows that there is no
> bottleneck in the CPU. Take a look:
Unless you have a morbid curiousity :) there isn't much point in binding
all the netperf's to CPU 0 when the interrupts for the NICs servicing
their connections are on CPUs 1-4. I also assume then that the
system(s) on which netserver is running have > 8 CPUs in them? (There
are multiple destination systems yes?)
Does anything change if you explicitly bind each netperf to the CPU on
which the interrups for its connection are processed? Or for that
matter if you remove the -T command entirely
Does UDP_STREAM show different performance than TCP_STREAM (I'm
ass-u-me-ing based on the above we are looking at the netperf side of a
TCP_STREAM test above, please correct if otherwise).
Are the CPUs above single-core CPUs or multi-core CPUs, and if
multi-core are caches shared? How are CPUs numbered if multi-core on
that system? Is there any hardware threading involved? I'm wondering
if there may be some wrinkles in the system that might lead to reported
CPU utilization being low even if a chip is otherwise saturated. Might
need some HW counters to check that...
Can you describe the I/O subsystem more completely? I understand that
you are using at most two ports of a pair of quad-port cards at any one
time, but am still curious to know if those two cards are on separate
busses, or if they share any bus/link on the way to memory.
rick jones
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists