netdev - Re: Bad network performance over 2Gbps

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <48051734.1000107@redhat.com>
Date:	Tue, 15 Apr 2008 16:59:32 -0400
From:	Chris Snook <csnook@...hat.com>
To:	"Kok, Auke" <auke-jan.h.kok@...el.com>
CC:	"H. Willstrand" <h.willstrand@...il.com>,
	Anton Titov <a.titov@...t.bg>, netdev@...r.kernel.org,
	Jesse Brandeburg <jesse.brandeburg@...el.com>
Subject: Re: Bad network performance over 2Gbps

Kok, Auke wrote:
> H. Willstrand wrote:
>> [Changed mail list]
>>
>> On Tue, Apr 15, 2008 at 8:06 PM, Anton Titov <a.titov@...t.bg> wrote:
>>> I use Linux for serving a huge amount of static web on few servers. When
>>>  network traffic goes above 2Gbit/sec ksoftirqd/5 (not every time 5, but
>>>  every time just one) starts using exactly 100% CPU time and packet
>>>  packet loss starts preventing traffic from going up. When the network
>>>  traffic is lower than 1.9Gbit ksoftirqds use 0% CPU according to top.
>>>
>>>  Uplink is 6 gigabit Intel cards bonded together using 802.3ad algorithm
>>>  with xmit_hash_policy set to layer3+4. On the other side is Cisco 2960
>>>  switch. Machine is with two quad core Intel Xeons @2.33GHz.
>>>
>>>  Here goes a screen snapshot of "top" command. The described behavior
>>>  have nothing to do with 13% io-wait. It happens even if it is 0%
>>>  io-wait.
>>>  http://www.titov.net/misc/top-snap.png
>>>
>>>  kernel configuration:
>>>  http://www.titov.net/misc/config.gz
>>>
>>>  /proc/interrupts, lspci, dmesg (nothing intresting there), ifconfig,
>>>  uname -a:
>>>  http://www.titov.net/misc/misc.txt.gz
>>>
>>>  Is it a Linux bug or some hardware limitation?
> 
> I'm wondering if this is not a classical demonstration of the NAPI-irq trap where
> after migration all the interrupts from the various cards are migrated to a single
> CPU, and because of NAPI once they're busy polling won't ever migrate away from
> that CPU again.
> 
> Have you looked at `cat /proc/interrupts` before and after this happens?
> 
> My guess is that your specific situation can benefit from setting smp_affinity and
> forcing the NIC irq's so that you're at least occupying the load over multiple
> CPU's (but preferably ones that use the same cache!) will help relieve the situation.
> 
> alternatively you might even see an improvement by disabling NAPI. depending on
> the driver that you're using this might be possible.
> 
> I actually don't know much about bonding and how this affects everything, but my
> guess is that that's a less important factor in this issue.
> 
> Cheers,
> 
> Auke

I'm not sure that spreading IRQs out completely is necessarily a good 
idea, due to cache line ping-pong.  I suspect you'll get optimal 
performance by assigning the six IRQs to two cores that share an L2 cache.

Still, I think you're on to something here.  Disabling NAPI and instead 
tuning the cards' interrupt coalescing settings might allow irqbalance 
to do a better job than it is currently.

-- Chris
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html