lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49A6CE39.5050200@athenacr.com>
Date:	Thu, 26 Feb 2009 12:15:37 -0500
From:	Kenny Chang <kchang@...enacr.com>
To:	netdev@...r.kernel.org
Subject: Re: Multicast packet loss

Neil Horman wrote:
> On Tue, Feb 03, 2009 at 12:34:54PM -0500, Kenny Chang wrote:
>> Eric Dumazet wrote:
>>> Wes Chow a écrit :
>>>   
>>>> Eric Dumazet wrote:
>>>>     
>>>>> Wes Chow a écrit :
>>>>>       
>>>>>> (I'm Kenny's colleague, and I've been doing the kernel builds)
>>>>>>
>>>>>> First I'd like to note that there were a lot of bnx2 NAPI changes
>>>>>> between 2.6.21 and 2.6.22. As a reminder, 2.6.21 shows tiny amounts
>>>>>> of packet loss,
>>>>>> whereas loss in 2.6.22 is significant.
>>>>>>
>>>>>> Second, some CPU affinity info: if I do like Eric and pin all of the
>>>>>> apps onto a single CPU, I see no packet loss. Also, I do *not* see
>>>>>> ksoftirqd show up on top at all!
>>>>>>
>>>>>> If I pin half the processes on one CPU and the other half on another
>>>>>> CPU, one ksoftirqd processes shows up in top and completely pegs one
>>>>>> CPU. My packet loss
>>>>>> in that case is significant (25%).
>>>>>>
>>>>>> Now, the strange case: if I pin 3 processes to one CPU and 1 process
>>>>>> to another, I get about 25% packet loss and ksoftirqd pins one CPU.
>>>>>> However, one
>>>>>> of the apps takes significantly less CPU than the others, and all
>>>>>> apps lose the
>>>>>> *exact same number of packets*. In all other situations where we see
>>>>>> packet
>>>>>> loss, the actual number lost per application instance appears random.
>>>>>>         
>>>>> You see same number of packet lost because they are lost at NIC level
>>>>>       
>>>> Understood.
>>>>
>>>> I have a new observation: if I pin processes to just CPUs 0 and 1, I see
>>>> no packet loss. Pinning to 0 and 2, I do see packet loss. Pinning 2 and
>>>> 3, no packet loss. 4 & 5 - no packet loss, 6 & 7 - no packet loss. Any
>>>> other combination appears to produce loss (though I have not tried all
>>>> 28 combinations, this seems to be the case).
>>>>
>>>> At first I thought maybe it had to do with processes pinned to the same
>>>> CPU, but different cores. The machine is a dual quad core, which means
>>>> that CPUs 0-3 should be a physical CPU, correct? Pinning to 0/2 and 0/3
>>>> produce packet loss.
>>>>     
>>> a quad core is really a 2 x 2 core
>>>
>>> L2 cache is splited on two blocks, one block used by CPU0/1, other by 
>>> CPU2/3 
>>>
>>> You are at the limit of the machine with such workload, so as soon as your
>>> CPUs have to transfert 64 bytes lines between those two L2 blocks, you loose.
>>>
>>>
>>>   
>>>> I've also noticed that it does not matter which of the working pairs I
>>>> pin to. For example, pinning 5 processes in any combination on either
>>>> 0/1 produce no packet loss, pinning all 5 to just CPU 0 also produces no
>>>> packet loss.
>>>>
>>>> The failures are also sudden. In all of the working cases mentioned
>>>> above, I don't see ksoftirqd on top at all. But when I run 6 processes
>>>> on a single CPU, ksoftirqd shoots up to 100% and I lose a huge number of
>>>> packets.
>>>>
>>>>     
>>>>> Normaly, softirq runs on same cpu (the one handling hard irq)
>>>>>       
>>>> What determines which CPU the hard irq occurs on?
>>>>
>>>>     
>>> Check /proc/irq/{irqnumber}/smp_affinity
>>>
>>> If you want IRQ16 only served by CPU0 :
>>>
>>> echo 1 >/proc/irq/16/smp_affinity
>>>
>>>   
>> Hi everyone,
>>
>> -snip-
>> Correct me if I'm wrong, from what we've seen, it looks like its  
>> pointing to some inefficiency in the softirq handling.  The question is  
>> whether it's something in the driver or the kernel.  If we can isolate  
>> that, maybe we can take some action to have it fixed.
>>
> I don't think its sofirq ineffeciencies (oprofile would have shown that).  I
> know I keep harping on this, but I still think irq affininty is your problem.
> I'd be interested in knowning what your /proc/interrupts file looked like on
> each of the above kenrels.  Perhaps its not that the bnx2 card you have can't
> handle the setting of MSI interrupt affinities, but rather that something
> changeed to break irq affinity on this card.
>
> Neil
>
>
It's been a while since I updated this thread.  We've been running 
through the different suggestions and tabulating their effects, as well 
as trying out an Intel card.  The short story is that setting affinity 
and MSI works to some extent, and the Intel card doesn't seem to change 
things significantly.  The results don't seem consistent enough for us 
to be able to point to a smoking gun.

It does look like the 2.6.29-rc4 kernel performs okay with the Intel 
card, but this is not a real-time build and it's not likely to be in a 
supported Ubuntu distribution real soon.  We've reached the point where 
we'd like to look for an expert dedicated to work on this problem for a 
period of time.  The final result being some sort of solution to produce 
a realtime configuration with a reasonably "aged" kernel (.24~.28) that 
has multicast performance greater than or equal to that of 2.6.15.

If anybody is interested in devoting some compensated time to this 
issue, we're offering up a bounty: 
http://www.athenacr.com/bounties/multicast-performance/

For completeness, here's the table of our experiment results:

====================== ================== ========= ========== 
=============== ============== ============== =================
Kernel                 flavor             IRQ       affinity   *4x 
mcasttest*  *5x mcasttest* *6x mcasttest*  *Mtools2* [4]_ 
====================== ================== ========= ========== 
=============== ============== ============== =================
Intel 
e1000e                                                                                                                  

-----------------------------------------+---------+----------+---------------+--------------+--------------+-----------------
2.6.24.19              rt                |          any       | 
OK              Maybe          X                             
2.6.24.19              rt                |          CPU0      | 
OK              OK             X                             
2.6.24.19              generic           |          any       | 
X                                                            
2.6.24.19              generic           |          CPU0      | 
OK                                                           
2.6.29-rc3             vanilla-server    |          any       | 
X                                                            
2.6.29-rc3             vanilla-server    |          CPU0      | 
OK                                                           
2.6.29-rc4             vanilla-generic   |          any       | 
X                                             OK             
2.6.29-rc4             vanilla-generic   |          CPU0      | OK   
           OK             OK [5]_        OK             
-----------------------------------------+---------+----------+---------------+--------------+--------------+-----------------
Broadcom 
BNX2                                                                                                                 

-----------------------------------------+---------+----------+---------------+--------------+--------------+-----------------
2.6.24-19              rt                | MSI      any       | 
OK              OK             X                             
2.6.24-19              rt                | MSI      CPU0      | 
OK              Maybe          X                             
2.6.24-19              rt                | APIC     any       | 
OK              OK             X                             
2.6.24-19              rt                | APIC     CPU0      | 
OK              Maybe          X                             
2.6.24-19-bnx-latest   rt                | APIC     CPU0      | 
OK              X                                            
2.6.24-19              server            | MSI      any       | 
X                                                            
2.6.24-19              server            | MSI      CPU0      | 
OK                                                           
2.6.24-19              generic           | APIC     any       | 
X                                                            
2.6.24-19              generic           | APIC     CPU0      | 
OK                                                           
2.6.27-11              generic           | APIC     any       | 
X                                                            
2.6.27-11              generic           | APIC     CPU0      | 
OK              10% drop                                      
2.6.28-8               generic           | APIC     any       | 
OK              X                                             
2.6.28-8               generic           | APIC     CPU0      | 
OK              OK             0.5% drop                      
2.6.29-rc3             vanilla-server    | MSI      any       | 
X                                                            
2.6.29-rc3             vanilla-server    | MSI      CPU0      | 
X                                                            
2.6.29-rc3             vanilla-server    | APIC     any       | 
OK              X                                            
2.6.29-rc3             vanilla-server    | APIC     CPU0      | 
OK              OK                                           
2.6.29-rc4             vanilla-generic   | APIC     any       | 
X                                                            
2.6.29-rc4             vanilla-generic   | APIC     CPU0      | 
OK              3% drop        10% drop       X              
====================== 
==================+=========+==========+===============+==============+==============+=================
* [4] MTools2 is a test from 29West: http://www.29west.com/docs/TestNet/
* [5] In 5 trials, 1 of the trials dropped 2%, 4 of the trials dropped 
nothing.

Kenny

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ