[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b3acabde-cd6d-589c-05f9-a2e75db94b6b@candelatech.com>
Date: Wed, 9 May 2018 12:02:34 -0700
From: Ben Greear <greearb@...delatech.com>
To: Eric Dumazet <eric.dumazet@...il.com>,
netdev <netdev@...r.kernel.org>
Subject: Re: Performance regression between 4.13 and 4.14
On 05/09/2018 11:48 AM, Eric Dumazet wrote:
>
>
> On 05/09/2018 11:43 AM, Ben Greear wrote:
>> On 05/08/2018 10:10 AM, Eric Dumazet wrote:
>>>
>>>
>>> On 05/08/2018 09:44 AM, Ben Greear wrote:
>>>> Hello,
>>>>
>>>> I am trying to track down a performance regression that appears to be between 4.13
>>>> and 4.14.
>>>>
>>>> I first saw the problem with a hacked version of pktgen on some ixgbe NICs. 4.13 can do
>>>> right at 10G bi-directional on two ports, and 4.14 and later can do only about 6Gbps.
>>>>
>>>> I also tried with user-space UDP traffic on a stock kernel, and I can get about 3.2Gbps combined tx+rx
>>>> on 4.14 and about 4.4Gbps on 4.13.
>>>>
>>>> Attempting to bisect seems to be triggering a weirdness in git, and also lots of commits
>>>> crash or do not bring up networking, which makes the bisect difficult.
>>>>
>>>> Looking at perf top, it would appear that some lock is probably to blame.
>>>
>>>
>>> perf record -a -g -e cycles:pp sleep 5
>>> perf report
>>>
>>> Then you'll be able to tell us which lock (or call graph) is killing your perf.
>>>
>>
>> I seem to be chasing multiple issues. For 4.13, at least part of my problem was that LOCKDEP was enabled,
>> during my bisect, though it does NOT appear enabled in 4.16. I think maybe CONFIG_LOCKDEP moved to CONFIG_PROVE_LOCKING
>> in 4.16, or something like that? My 4.16 .config does have CONFIG_LOCKDEP_SUPPORT enabled, and I see no option to disable it:
>>
>> [greearb@...-dt3 linux-4.16.x64]$ grep LOCKDEP .config
>> CONFIG_LOCKDEP_SUPPORT=y
>>
>>
>> For 4.16, I am disabling RETRAMPOLINE...are there any other such things I need
>> to disable to keep from getting a performance hit from the spectre-related bug
>> fixes? At this point, I do not care about the security implications.
>>
>> greearb@...-dt3 linux-4.16.x64]$ grep RETPO .config
>> # CONFIG_RETPOLINE is not set
>>
>>
>> Thanks,
>> Ben
>>
>
> No idea really, you mention a 4.13 -> 4.14 regression and jump then to 4.16 :/
I initially saw the problem in 4.16, then bisected, and 4.14 still showed the
issue.
4.13 works, but only when I use a .config I originally built for 4.13, not the 4.16 .config
that I ended up using with the bisect (make oldconfig, accept all defaults). I originally
configured 4.16 with a .config that had lockdep enabled, then manually tried to disable it
through 'make xconfig'. I think that must leave "CONFIG_LOCKDEP=y" in the .config, which
screws up older builds during bisect, perhaps?
> Before doing a (painful) dissection, the perf output would immediately tell you if
> something is really wrong on your .config.
I didn't realize lockdep might be an issue at the time, but here is a 'bad' run from
a 4.13+ (plus pktgen hacks). I guess lockdep is why this runs slowly, but I see no obvious
proof of that in the output:
4.13+, patched pktgen, 6Gbps throughput, on commit 906dde0f355bd97c080c215811ae7db1137c4af8
Samples: 26K of event 'cycles:pp', Event count (approx.): 20119166736
Children Self Command Shared Object Symbol
+ 87.97% 0.00% kpktgend_1 [kernel.kallsyms] [k] ret_from_fork
+ 87.97% 0.00% kpktgend_1 [kernel.kallsyms] [k] kthread
+ 86.89% 5.42% kpktgend_1 [kernel.kallsyms] [k] pktgen_thread_worker
+ 33.75% 0.18% kpktgend_1 [kernel.kallsyms] [k] getnstimeofday64
+ 32.77% 4.47% kpktgend_1 [kernel.kallsyms] [k] __getnstimeofday64
+ 24.60% 10.91% kpktgend_1 [kernel.kallsyms] [k] lock_acquire
+ 23.59% 0.03% kpktgend_1 [kernel.kallsyms] [k] __do_softirq
+ 23.55% 0.07% kpktgend_1 [kernel.kallsyms] [k] net_rx_action
+ 22.29% 0.47% kpktgend_1 [kernel.kallsyms] [k] getRelativeCurNs
+ 21.33% 1.71% kpktgend_1 [kernel.kallsyms] [k] ixgbe_poll
+ 15.79% 0.02% kpktgend_1 [kernel.kallsyms] [k] ret_from_intr
+ 15.78% 0.01% kpktgend_1 [kernel.kallsyms] [k] do_IRQ
+ 15.34% 0.01% kpktgend_1 [kernel.kallsyms] [k] irq_exit
+ 13.95% 10.00% kpktgend_1 [kernel.kallsyms] [k] ip_send_check
+ 13.80% 13.80% kpktgend_1 [kernel.kallsyms] [k] __lock_acquire.isra.31
+ 12.98% 0.53% kpktgend_1 [kernel.kallsyms] [k] pktgen_finalize_skb
+ 12.31% 0.20% kpktgend_1 [kernel.kallsyms] [k] timestamp_skb.isra.24
+ 11.68% 0.13% kpktgend_1 [kernel.kallsyms] [k] napi_gro_receive
+ 11.36% 0.25% kpktgend_1 [kernel.kallsyms] [k] netif_receive_skb_internal
+ 10.93% 0.00% swapper [kernel.kallsyms] [k] verify_cpu
+ 10.93% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry
+ 10.92% 0.02% swapper [kernel.kallsyms] [k] do_idle
+ 10.71% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter
+ 10.71% 0.00% swapper [kernel.kallsyms] [k] call_cpuidle
+ 10.66% 0.06% swapper [kernel.kallsyms] [k] cpuidle_enter_state
+ 9.78% 0.00% swapper [kernel.kallsyms] [k] x86_64_start_kernel
+ 9.78% 0.00% swapper [kernel.kallsyms] [k] x86_64_start_reservations
+ 9.78% 0.00% swapper [kernel.kallsyms] [k] start_kernel
+ 9.78% 0.00% swapper [kernel.kallsyms] [k] rest_init
+ 9.25% 0.00% swapper [kernel.kallsyms] [k] ret_from_intr
+ 9.25% 0.00% swapper [kernel.kallsyms] [k] do_IRQ
+ 9.24% 0.30% kpktgend_1 [kernel.kallsyms] [k] pktgen_alloc_skb
+ 9.14% 0.00% swapper [kernel.kallsyms] [k] irq_exit
+ 8.89% 0.00% swapper [kernel.kallsyms] [k] __do_softirq
+ 8.82% 0.01% swapper [kernel.kallsyms] [k] net_rx_action
+ 8.80% 0.47% kpktgend_1 [kernel.kallsyms] [k] __local_bh_enable_ip
+ 8.67% 6.87% kpktgend_1 [kernel.kallsyms] [k] lock_release
+ 8.57% 1.10% kpktgend_1 [kernel.kallsyms] [k] __netdev_alloc_skb
+ 8.28% 0.00% kpktgend_1 [kernel.kallsyms] [k] do_softirq.part.17
+ 8.28% 0.00% kpktgend_1 [kernel.kallsyms] [k] do_softirq_own_stack
+ 8.01% 0.74% swapper [kernel.kallsyms] [k] ixgbe_poll
+ 6.85% 2.28% kpktgend_1 [kernel.kallsyms] [k] __build_skb
+ 4.89% 0.07% kpktgend_1 [kernel.kallsyms] [k] __netif_receive_skb
+ 4.87% 0.75% kpktgend_1 [kernel.kallsyms] [k] __netif_receive_skb_core
+ 4.66% 0.16% kpktgend_1 [kernel.kallsyms] [k] _raw_spin_lock
+ 4.37% 0.06% swapper [kernel.kallsyms] [k] napi_gro_receive
+ 4.23% 0.10% swapper [kernel.kallsyms] [k] netif_receive_skb_internal
Here is a 'bad' run of 4.16.0 + my full out-of-tree patches, including my normal pktgen changes
(pktgen code should be nearly identical between the run above and the run here)
Samples: 26K of event 'cycles:pp', Event count (approx.): 19888249841
Children Self Command Shared Object Symbol
+ 74.24% 5.38% kpktgend_1 [kernel.kallsyms] [k] pktgen_thread_worker
+ 73.89% 0.00% kpktgend_1 [kernel.kallsyms] [k] ret_from_fork
+ 73.89% 0.00% kpktgend_1 [kernel.kallsyms] [k] kthread
+ 27.31% 0.25% kpktgend_1 [kernel.kallsyms] [k] getnstimeofday64
+ 26.15% 4.24% kpktgend_1 [kernel.kallsyms] [k] __getnstimeofday64
+ 22.64% 0.02% kpktgend_1 [kernel.kallsyms] [k] __softirqentry_text_s
+ 22.62% 0.06% kpktgend_1 [kernel.kallsyms] [k] net_rx_action
+ 22.59% 2.04% kpktgend_1 [kernel.kallsyms] [k] ixgbe_poll
+ 21.49% 7.92% kpktgend_1 [kernel.kallsyms] [k] lock_acquire
+ 17.78% 17.78% kpktgend_1 [kernel.kallsyms] [k] __lock_acquire.isra.3
+ 15.95% 0.39% kpktgend_1 [kernel.kallsyms] [k] getRelativeCurNs
+ 14.49% 0.02% kpktgend_1 [kernel.kallsyms] [k] ret_from_intr
+ 14.48% 0.00% kpktgend_1 [kernel.kallsyms] [k] do_IRQ
+ 14.35% 0.01% kpktgend_1 [kernel.kallsyms] [k] irq_exit
+ 13.46% 9.90% kpktgend_1 [kernel.kallsyms] [k] ip_send_check
+ 11.36% 0.18% kpktgend_1 [kernel.kallsyms] [k] napi_gro_receive
+ 11.25% 0.17% kpktgend_1 [kernel.kallsyms] [k] netif_receive_skb_int
+ 10.37% 0.16% kpktgend_1 [kernel.kallsyms] [k] timestamp_skb.isra.25
+ 8.85% 0.45% kpktgend_1 [kernel.kallsyms] [k] __local_bh_enable_ip
+ 8.61% 0.02% swapper [kernel.kallsyms] [k] do_idle
+ 8.60% 0.00% swapper [kernel.kallsyms] [k] secondary_startup_64
+ 8.60% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry
+ 8.37% 0.05% swapper [kernel.kallsyms] [k] cpuidle_enter_state
+ 8.35% 7.57% kpktgend_1 [kernel.kallsyms] [k] lock_release
+ 8.32% 0.00% kpktgend_1 [kernel.kallsyms] [k] do_softirq.part.16
+ 8.31% 0.00% kpktgend_1 [kernel.kallsyms] [k] do_softirq_own_stack
+ 7.16% 0.36% kpktgend_1 [kernel.kallsyms] [k] pktgen_alloc_skb
+ 7.09% 1.14% kpktgend_1 [kernel.kallsyms] [k] __netdev_alloc_skb
+ 6.87% 0.00% swapper [kernel.kallsyms] [k] start_kernel
+ 6.55% 0.00% swapper [kernel.kallsyms] [k] ret_from_intr
+ 6.55% 0.01% swapper [kernel.kallsyms] [k] do_IRQ
+ 6.51% 0.01% swapper [kernel.kallsyms] [k] irq_exit
+ 6.34% 0.02% swapper [kernel.kallsyms] [k] __softirqentry_text_s
+ 6.25% 0.01% swapper [kernel.kallsyms] [k] net_rx_action
+ 6.06% 0.45% swapper [kernel.kallsyms] [k] ixgbe_poll
+ 5.35% 2.11% kpktgend_1 [kernel.kallsyms] [k] __build_skb
I tried testing with stock kernels and UDP user-space traffic, but results varied widely
from run to run, so it was useless for bisecting.
I have irqbalance disabled on this system in order to pin pktgen process and ixgbe IRQs
to be efficient, so maybe that is why user-space UDP traffic works so unreliably. My hacked pktgen
reliably shows good on my good 4.13 kernel and reliably bad on higher kernels I have compiled
so far, so I decided it was the only useful way to bisect this problem.
I will now try 4.14 and higher again with lockdep definitely disabled.
Thanks,
Ben
--
Ben Greear <greearb@...delatech.com>
Candela Technologies Inc http://www.candelatech.com
Powered by blists - more mailing lists