lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1321382140.2856.38.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>
Date:	Tue, 15 Nov 2011 19:35:40 +0100
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Rick Jones <rick.jones2@...com>
Cc:	Andy Gospodarek <andy@...yhouse.net>,
	Jiri Pirko <jpirko@...hat.com>, netdev@...r.kernel.org,
	davem@...emloft.net, bhutchings@...arflare.com,
	shemminger@...tta.com, fubar@...ibm.com, tgraf@...radead.org,
	ebiederm@...ssion.com, mirqus@...il.com, kaber@...sh.net,
	greearb@...delatech.com, jesse@...ira.com, fbl@...hat.com,
	benjamin.poirier@...il.com, jzupka@...hat.com, ivecera@...hat.com
Subject: Re: [patch net-next V8] net: introduce ethernet teaming device

Le mardi 15 novembre 2011 à 09:22 -0800, Rick Jones a écrit :
> > On most modern systems I suspect there will be little to no difference
> > between bonding RX peformance and team performance.
> >
> > If there is any now, I suspect team and bond performance to be similar
> > by the time team has to account for the corner-cases bonding has already
> > resolved.  :-)
> >
> > Benchmarks may prove otherwise, but I've yet to see Jiri produce
> > anything.  My initial testing doesn't demonstrate any measureable
> > differences with 1Gbps interfaces on a multi-core, multi-socket system.
> 
> I wouldn't expect much difference in terms of bandwidth, I was thinking 
> the demonstration would be made in the area of service demand (CPU 
> consumed per unit work) and perhaps aggregate packets per second.

Well,

bonding is a NETIF_F_LLTX driver, but uses following rwlock in xmit
path :

read_lock(&bond->curr_slave_lock);
...
read_unlock(&bond->curr_slave_lock);

Two atomic operations on a contended cache line.

On a 16 cpu machine, here is some "perf stat" data of such workload :
(each thread doing 10.000.000 atomic_inc(&somesharedvar) )

# perf stat ./atomic 16

 Performance counter stats for './atomic 16':

      48016,104204 task-clock                #   15,566 CPUs utilized          
               555 context-switches          #    0,000 M/sec                  
                15 CPU-migrations            #    0,000 M/sec                  
               175 page-faults               #    0,000 M/sec                  
   121 669 943 013 cycles                    #    2,534 GHz                    
   121 321 455 748 stalled-cycles-frontend   #   99,71% frontend cycles idle   
   103 375 494 290 stalled-cycles-backend    #   84,96% backend  cycles idle   
       611 624 619 instructions              #    0,01  insns per cycle        
                                             #   198,36  stalled cycles per insn
       184 530 032 branches                  #    3,843 M/sec                  
           581 513 branch-misses             #    0,32% of all branches        

       3,084672937 seconds time elapsed

Cost per 'read_lock/read_unlock pair' : at least 616 ns

While on one cpu only :

# perf stat ./atomic 1

 Performance counter stats for './atomic 1':

         83,475050 task-clock                #    0,998 CPUs utilized          
                 3 context-switches          #    0,000 M/sec                  
                 1 CPU-migrations            #    0,000 M/sec                  
               144 page-faults               #    0,002 M/sec                  
       211 508 600 cycles                    #    2,534 GHz                    
       193 502 947 stalled-cycles-frontend   #   91,49% frontend cycles idle   
       124 428 400 stalled-cycles-backend    #   58,83% backend  cycles idle   
        30 870 434 instructions              #    0,15  insns per cycle        
                                             #    6,27  stalled cycles per insn
        10 163 364 branches                  #  121,753 M/sec                  
             9 633 branch-misses             #    0,09% of all branches        

       0,083679928 seconds time elapsed

Cost per 'read_lock/read_unlock pair' : 16 ns


Of course, bonding could be changed to use RCU as well,
if someone feels the need.

But teaming was designed to be RCU ready from the beginning.

exi

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ