netdev - Re: Multicast packet loss

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090205132956.GB15621@hmsreliant.think-freely.org>
Date:	Thu, 5 Feb 2009 08:29:56 -0500
From:	Neil Horman <nhorman@...driver.com>
To:	Kenny Chang <kchang@...enacr.com>
Cc:	netdev@...r.kernel.org
Subject: Re: Multicast packet loss

On Wed, Feb 04, 2009 at 11:07:13AM -0500, Kenny Chang wrote:
> Neil Horman wrote:
>> On Tue, Feb 03, 2009 at 10:20:13AM -0500, Kenny Chang wrote:
>>   
>>> Neil Horman wrote:
>>>     
>>>> On Mon, Feb 02, 2009 at 11:48:25AM -0500, Kenny Chang wrote:
>>>>         
>>>>> Neil Horman wrote:
>>>>>             
>>>>>> On Fri, Jan 30, 2009 at 11:41:23PM +0100, Eric Dumazet wrote:
>>>>>>                   
>>>>>>> Kenny Chang a écrit :
>>>>>>>                         
>>>>>>>> Ah, sorry, here's the test program attached.
>>>>>>>>
>>>>>>>> We've tried 2.6.28.1, but no, we haven't tried the 2.6.28.2 or the
>>>>>>>> 2.6.29.-rcX.
>>>>>>>>
>>>>>>>> Right now, we are trying to step through the kernel versions until we
>>>>>>>> see where the performance drops significantly.  We'll try 2.6.29-rc soon
>>>>>>>> and post the result.
>>>>>>>>                               
>>>>>>> 2.6.29-rc contains UDP receive improvements (lockless)
>>>>>>>
>>>>>>> Problem is multicast handling was not yet updated, but could be :)
>>>>>>>
>>>>>>>
>>>>>>> I was asking you "cat /proc/interrupts" because I believe you might
>>>>>>> have a problem NIC interrupts being handled by one CPU only (when having problems)
>>>>>>>
>>>>>>>                         
>>>>>> That would be expected (if irqbalance is running), and desireable, since
>>>>>> spreading high volume interrupts like NICS accross multiple cores (or more
>>>>>> specifically multiple L2 caches), is going increase your cache line miss rate
>>>>>> significantly and decrease rx throughput.
>>>>>>
>>>>>> Although you do have a point here, if the system isn't running irqbalance, and
>>>>>> the NICS irq affinity is spread accross multiple L2 caches, that would be a
>>>>>> point of improvement performance-wise.  
>>>>>>
>>>>>> Kenny, if you could provide the /proc/interrupts info along with /proc/cpuinfo
>>>>>> and your stats that I asked about earlier, that would be a big help.
>>>>>>
>>>>>> Regards
>>>>>> Neil
>>>>>>
>>>>>>                   
>>>>> This is for a working setup.
>>>>>
>>>>>             
>>>> Are these quad core systems?  Or dual core w/ hyperthreading?  I ask because in
>>>> your working setup you have 1/2 the number of cpus' and was not sure if you
>>>> removed an entire package of if you just disabled hyperthreading.
>>>>
>>>>
>>>> Neil
>>>>
>>>>         
>>> Yeah, these are quad core systems.  The 8 cpu system is a 
>>> dual-processor  quad-core.  The other is my desktop, single cpu quad 
>>> core.
>>>
>>>     
>> Ok, so their separate systms then.  Did you actually experience drops on the
>> 8-core system since the last reboot?  I ask because even when its distributed
>> across all 8 cores, you only have about 500 total interrupts from the NIC, and
>> if you did get drops, something more than just affinity is wrong.
>>
>> Regardless, spreading interrupts across cores is definately a problem.  As eric
>> says, quad core chips are actually 2x2 cores, so you'll want to either just run
>> irqbalance to assign an apropriate affinity to the NIC, or manually look at each
>> cores physical id and sibling id, to assign affininty to a core or cores that
>> share an L2 cache.  If you need to, as you've found, you may need to disable msi
>> interrupt mode on your bnx2 driver.  That kinda stinks, but bnx2 IIRC isn't
>> multiqueue, so its not like msi provides you any real performance gain.
>>
>> Neil
>>
>>   
> Hi Neil,
>
> Yeah, we've been rebooting this system left and right switch kernels.   
> The results are fairly consistent.  We were able to set the irq  
> affinities, and as Wes had mentioned, what we see is that if we pin the  
> softirq to 1 core, and pin the app to its sibling, we see really good  
> performance, but as we load up other cores, the machine reaches a  
> breaking point where all hell breaks loose and we drop a bunch.  (we  
> hadn't turned off msi btw..)
>
> While we were able to tune and adjust performance like that, in the end,  
> it doesn't really explain the difference between earlier and recent  
> kernels, also it doesn't quite explain the difference between machines.
>
> You mentioned it would be good to see the interrupts for each kernel, in  
> light of the above information, would it still be useful for me to  
> provide that?
>
In light of what you said, I probably don't need to see it no, although if you
go through testing on all these kernels again, I would suggest you take a look
at the /proc/interrupt numbers yourself.  Like you said, its odd that this
behavior changed, since the fast receive path is fairly consistent.  It may be
that te nic driver your using (bnx2 I think?), had a change that either broke
the ability to set affinity for msi interrupts, forcing an irq spread and
killing performance, or perhaps some large inefficiency was introduced either in
the interrupt handler itself, or in the napi poll method of the driver.  Another
good analysis technique would be to grab the latest kernel (which is 'broken' I
think your chart indicated), and the nic driver from the last working kernel.
Merge the driver into the latest kernel and see if the problem persists.  If
not, thats a pretty good indicator that a change in the driver has at least some
culpability.

Best
Neil

> Kenny
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html