[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87y1amo7w0.ffs@tglx>
Date: Wed, 13 Mar 2024 09:25:51 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Frederic Weisbecker <frederic@...nel.org>, kernel test robot
<oliver.sang@...el.com>
Cc: Anna-Maria Behnsen <anna-maria@...utronix.de>, oe-lkp@...ts.linux.dev,
lkp@...el.com, linux-kernel@...r.kernel.org, x86@...nel.org,
ying.huang@...el.com, feng.tang@...el.com, fengwei.yin@...el.com
Subject: Re: [tip:timers/core] [timers] 7ee9887703:
netperf.Throughput_Mbps -1.2% regression
On Wed, Mar 13 2024 at 00:57, Frederic Weisbecker wrote:
> So I can reproduce. And after hours staring at traces I haven't really found
> the real cause of this. 1% difference is not always easy to track down.
> But here are some sort of conclusion so far:
>
> _ There is an increase of ksoftirqd use (+13%) but if I boot with threadirqs
> before and after the patch (which means that ksoftirqd is used all the time
> for softirq handling) I still see the performance regression. So this
> shouldn't play a role here.
>
> _ I suspected that timer migrators handling big queues of timers on behalf of
> idle CPUs would delay NET_RX softirqs but it doesn't seem to be the case. I
> don't see TIMER vector delaying NET_RX vector after the hierarchical pull
> model, quite the opposite actually, they are less delayed overall.
>
> _ I suspected that timer migrators handling big queues would add scheduling
> latency. But it doesn't seem to be the case. Quite the opposite again,
> surprisingly.
>
> _ I have observed that, in average, timers execute later with the hierarchical
> pull model. The following delta:
> time of callback execution - bucket_expiry
> is 3 times higher with the hierarchical pull model. Whether that plays a role
> is unclear. It might still be interesting to investigate.
>
> _ The initial perf profile seem to suggest a big increase of task migration. Is
> it the result of ping-pong wakeup? Does that play a role?
Migration is not cheap. The interesting question is whether this is
caused by remote timer expiry.
Looking at the perf data there are significant changes vs. idle too:
perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
36.91 ± 2% -12.6 24.32 ± 10% -12.3 24.63 ± 5%
That indicates that cpuidle is spending less time in idle polling, which
means that wakeup latency increases. That obviously might be a result of
the timer migration properties.
Do you have traces (before and after) handy to share?
Thanks,
tglx
Powered by blists - more mailing lists