[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <SN2PR03MB079DA7799199CD98C52B5389E620@SN2PR03MB079.namprd03.prod.outlook.com>
Date: Wed, 10 Dec 2014 16:56:38 +0000
From: "cristian.bercaru@...escale.com" <cristian.bercaru@...escale.com>
To: "netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC: "R89243@...escale.com" <R89243@...escale.com>,
Madalin-Cristian Bucur <madalin.bucur@...escale.com>,
"Razvan.Ungureanu@...escale.com" <Razvan.Ungureanu@...escale.com>
Subject: atomic operations bottleneck in the IPv6 stack
Hello!
I am running IPv6 forwarding cases and I get worse performance with 24 cores than with 16 cores.
Test scenario:
10G --->[T4240]---> 10G
- platform: Freescale T4240, powerpc, 24 x e6500 64-bit cores (I can disable 8 of them from uboot)
- input type: raw IPv6 78-byte packets
- input rate: 10Gbps
- forwarding/output rate: 16 cores - 3.3 Gbps; 24 cores - 2.4 Gbps
Doing a perf with "record -C 1 -c 10000000 -a sleep 120" record I observe
- on 16 cores
# Overhead Command Shared Object Symbol
19.59% ksoftirqd/1 [kernel.kallsyms] [k] .ip6_pol_route
18.07% ksoftirqd/1 [kernel.kallsyms] [k] .dst_release
5.09% ksoftirqd/1 [kernel.kallsyms] [k] .__netif_receive_skb_core
- on 24 cores
34.98% ksoftirqd/1 [kernel.kallsyms] [k] .ip6_pol_route
31.86% ksoftirqd/1 [kernel.kallsyms] [k] .dst_release
3.76% ksoftirqd/1 [kernel.kallsyms] [k] .ip6_finish_output2
2.72% ksoftirqd/1 [kernel.kallsyms] [k] .__netif_receive_skb_core
I de-inlined 'atomic_dec_return' and 'atomic_inc' that are used by 'ip6_pol_route' and 'dst_release' and I get
- on 16 cores
17.26% ksoftirqd/1 [kernel.kallsyms] [k] .atomic_dec_return_noinline
13.45% ksoftirqd/1 [kernel.kallsyms] [k] .atomic_inc_noinline
5.53% ksoftirqd/1 [kernel.kallsyms] [k] .ip6_pol_route
5.02% ksoftirqd/1 [kernel.kallsyms] [k] .__netif_receive_skb_core
- on 24 cores
32.45% ksoftirqd/1 [kernel.kallsyms] [k] .atomic_dec_return_noinline
30.56% ksoftirqd/1 [kernel.kallsyms] [k] .atomic_inc_noinline
4.71% ksoftirqd/1 [kernel.kallsyms] [k] .ip6_pol_route
3.57% ksoftirqd/1 [kernel.kallsyms] [k] .ip6_finish_output2
It seems to me that the atomic operations on the IPv6 forwarding path are a bottleneck and they are not scalable with the number of cores. Am I right? What improvements can be brought to the IPv6 kernel code to make it less dependent of atomic operations/variables?
Thank you,
Cristian Bercaru
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists