netdev - atomic operations bottleneck in the IPv6 stack

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <SN2PR03MB079DA7799199CD98C52B5389E620@SN2PR03MB079.namprd03.prod.outlook.com>
Date:	Wed, 10 Dec 2014 16:56:38 +0000
From:	"cristian.bercaru@...escale.com" <cristian.bercaru@...escale.com>
To:	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC:	"R89243@...escale.com" <R89243@...escale.com>,
	Madalin-Cristian Bucur <madalin.bucur@...escale.com>,
	"Razvan.Ungureanu@...escale.com" <Razvan.Ungureanu@...escale.com>
Subject: atomic operations bottleneck in the IPv6 stack


Hello!

I am running IPv6 forwarding cases and I get worse performance with 24 cores than with 16 cores.

Test scenario:
10G --->[T4240]---> 10G
- platform: Freescale T4240, powerpc, 24 x e6500 64-bit cores (I can disable 8 of them from uboot)
- input type: raw IPv6 78-byte packets
- input rate: 10Gbps
- forwarding/output rate: 16 cores - 3.3 Gbps; 24 cores - 2.4 Gbps

Doing a perf with "record -C 1 -c 10000000 -a sleep 120" record I observe
- on 16 cores
# Overhead      Command      Shared Object       Symbol
    19.59%  ksoftirqd/1  [kernel.kallsyms]  [k] .ip6_pol_route                      
    18.07%  ksoftirqd/1  [kernel.kallsyms]  [k] .dst_release                        
     5.09%  ksoftirqd/1  [kernel.kallsyms]  [k] .__netif_receive_skb_core           
- on 24 cores
    34.98%  ksoftirqd/1  [kernel.kallsyms]  [k] .ip6_pol_route                      
    31.86%  ksoftirqd/1  [kernel.kallsyms]  [k] .dst_release                        
     3.76%  ksoftirqd/1  [kernel.kallsyms]  [k] .ip6_finish_output2                 
     2.72%  ksoftirqd/1  [kernel.kallsyms]  [k] .__netif_receive_skb_core           

I de-inlined 'atomic_dec_return' and 'atomic_inc' that are used by 'ip6_pol_route' and 'dst_release' and I get
- on 16 cores
    17.26%  ksoftirqd/1  [kernel.kallsyms]  [k] .atomic_dec_return_noinline         
    13.45%  ksoftirqd/1  [kernel.kallsyms]  [k] .atomic_inc_noinline                
     5.53%  ksoftirqd/1  [kernel.kallsyms]  [k] .ip6_pol_route                      
     5.02%  ksoftirqd/1  [kernel.kallsyms]  [k] .__netif_receive_skb_core           
- on 24 cores
    32.45%  ksoftirqd/1  [kernel.kallsyms]  [k] .atomic_dec_return_noinline         
    30.56%  ksoftirqd/1  [kernel.kallsyms]  [k] .atomic_inc_noinline                
     4.71%  ksoftirqd/1  [kernel.kallsyms]  [k] .ip6_pol_route                      
     3.57%  ksoftirqd/1  [kernel.kallsyms]  [k] .ip6_finish_output2                 

It seems to me that the atomic operations on the IPv6 forwarding path are a bottleneck and they are not scalable with the number of cores. Am I right? What improvements can be brought to the IPv6 kernel code to make it less dependent of atomic operations/variables?

Thank you,
Cristian Bercaru

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html