[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZfG9XFwXp_d5E0qc@localhost.localdomain>
Date: Wed, 13 Mar 2024 15:51:08 +0100
From: Frederic Weisbecker <frederic@...nel.org>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: kernel test robot <oliver.sang@...el.com>,
Anna-Maria Behnsen <anna-maria@...utronix.de>,
oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org,
x86@...nel.org, ying.huang@...el.com, feng.tang@...el.com,
fengwei.yin@...el.com
Subject: Re: [tip:timers/core] [timers] 7ee9887703: netperf.Throughput_Mbps
-1.2% regression
Le Wed, Mar 13, 2024 at 09:25:51AM +0100, Thomas Gleixner a écrit :
> On Wed, Mar 13 2024 at 00:57, Frederic Weisbecker wrote:
> > So I can reproduce. And after hours staring at traces I haven't really found
> > the real cause of this. 1% difference is not always easy to track down.
> > But here are some sort of conclusion so far:
> >
> > _ There is an increase of ksoftirqd use (+13%) but if I boot with threadirqs
> > before and after the patch (which means that ksoftirqd is used all the time
> > for softirq handling) I still see the performance regression. So this
> > shouldn't play a role here.
> >
> > _ I suspected that timer migrators handling big queues of timers on behalf of
> > idle CPUs would delay NET_RX softirqs but it doesn't seem to be the case. I
> > don't see TIMER vector delaying NET_RX vector after the hierarchical pull
> > model, quite the opposite actually, they are less delayed overall.
> >
> > _ I suspected that timer migrators handling big queues would add scheduling
> > latency. But it doesn't seem to be the case. Quite the opposite again,
> > surprisingly.
> >
> > _ I have observed that, in average, timers execute later with the hierarchical
> > pull model. The following delta:
> > time of callback execution - bucket_expiry
> > is 3 times higher with the hierarchical pull model. Whether that plays a role
> > is unclear. It might still be interesting to investigate.
> >
> > _ The initial perf profile seem to suggest a big increase of task migration. Is
> > it the result of ping-pong wakeup? Does that play a role?
>
> Migration is not cheap. The interesting question is whether this is
> caused by remote timer expiry.
>
> Looking at the perf data there are significant changes vs. idle too:
>
> perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> 36.91 ± 2% -12.6 24.32 ± 10% -12.3 24.63 ± 5%
>
> That indicates that cpuidle is spending less time in idle polling, which
> means that wakeup latency increases. That obviously might be a result of
> the timer migration properties.
Hmm, looking at the report, I'm reading the reverse.
More idle polling:
0.00 +13.2 13.15 � 49% +11.3 11.25 � 55% perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
And fewer C3:
31.82 � 3% -13.0 18.83 � 12% -13.2 18.65 � 6% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
And indeed I would have expected the reverse...
> Do you have traces (before and after) handy to share?
Sure. Here are two snapshots. trace.good is before the pull model and trace.bad
is after. The traces contain:
* sched_switch / sched_wakeup
* timer start and expire_entry
* softirq raise / entry / exit
* tmigr:*
* cpuidle
It's disappointing on the latter though because it only ever enters C1 in my
traces. Likely due to using KVM...
Thanks.
Download attachment "trace.good.xz" of type "application/x-xz" (4622048 bytes)
Content of type "application/x-xz" skipped
Content truncated
Powered by blists - more mailing lists