netdev - Re: e1000 performance issue in 4 simultaneous links

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4787B9E7.6040001@hp.com>
Date:	Fri, 11 Jan 2008 10:48:07 -0800
From:	Rick Jones <rick.jones2@...com>
To:	Breno Leitao <leitao@...ux.vnet.ibm.com>
CC:	Eric Dumazet <dada1@...mosbay.com>,
	"Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
	netdev@...r.kernel.org
Subject: Re: e1000 performance issue in 4 simultaneous links

Breno Leitao wrote:
> On Fri, 2008-01-11 at 17:48 +0100, Eric Dumazet wrote:
> 
>>Breno Leitao a écrit :
>>
>>>Take a look at the interrupt table this time: 
>>>
>>>io-dolphins:~/leitao # cat /proc/interrupts  | grep eth[1]*[67]
>>>277:         15    1362450         13         14         13         14         15         18   XICS      Level     eth6
>>>278:         12         13    1348681         19         13         15         10         11   XICS      Level     eth7
>>>323:         11         18         17    1348426         18         11         11         13   XICS      Level     eth16
>>>324:         12         16         11         19    1402709         13         14         11   XICS      Level     eth17
>>>
>>>
>>>  
>>
>>If your machine has 8 cpus, then your vmstat output shows a bottleneck :)
>>
>>(100/8 = 12.5), so I guess one of your CPU is full
> 
> 
> Well, if I run top while running the test, I see this load distributed
> among the CPUs, mainly those that had a NIC IRC bonded. Take a look:
> 
> Tasks: 133 total,   2 running, 130 sleeping,   0 stopped,   1 zombie
> Cpu0  :  0.3%us, 19.5%sy,  0.0%ni, 73.5%id,  0.0%wa,  0.0%hi,  0.0%si,  6.6%st
> Cpu1  :  0.0%us,  0.0%sy,  0.0%ni, 75.1%id,  0.0%wa,  0.7%hi, 24.3%si,  0.0%st
> Cpu2  :  0.0%us,  0.0%sy,  0.0%ni, 73.1%id,  0.0%wa,  0.7%hi, 26.2%si,  0.0%st
> Cpu3  :  0.0%us,  0.0%sy,  0.0%ni, 76.1%id,  0.0%wa,  0.7%hi, 23.3%si,  0.0%st
> Cpu4  :  0.0%us,  0.3%sy,  0.0%ni, 70.4%id,  0.7%wa,  0.3%hi, 28.2%si,  0.0%st
> Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu6  :  0.0%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
> Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

If you have IRQ's bound to CPUs 1-4, and have four netperfs running, 
given that the stack ostensibly tries to have applications run on the 
same CPUs, what is running on CPU0?

Is it related to:

>   The 2 interface test that I showed in my first email, was run in two
> different NIC. Also, I am running netperf with the following command
> "netperf -H <hostname> -T 0,8" while netserver is running without any
> argument at all. Also, running vmstat in parallel shows that there is no
> bottleneck in the CPU. Take a look: 

Unless you have a morbid curiousity :) there isn't much point in binding 
all the netperf's to CPU 0 when the interrupts for the NICs servicing 
their connections are on CPUs 1-4.  I also assume then that the 
system(s) on which netserver is running have > 8 CPUs in them? (There 
are multiple destination systems yes?)

Does anything change if you explicitly bind each netperf to the CPU on 
which the interrups for its connection are processed?  Or for that 
matter if you remove the -T command entirely

Does UDP_STREAM show different performance than TCP_STREAM (I'm 
ass-u-me-ing based on the above we are looking at the netperf side of a 
TCP_STREAM test above, please correct if otherwise).

Are the CPUs above single-core CPUs or multi-core CPUs, and if 
multi-core are caches shared?  How are CPUs numbered if multi-core on 
that system?  Is there any hardware threading involved?  I'm wondering 
if there may be some wrinkles in the system that might lead to reported 
CPU utilization being low even if a chip is otherwise saturated.  Might 
need some HW counters to check that...

Can you describe the I/O subsystem more completely?  I understand that 
you are using at most two ports of a pair of quad-port cards at any one 
time, but am still curious to know if those two cards are on separate 
busses, or if they share any bus/link on the way to memory.

rick jones
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html