linux-kernel - Re: Very high scheduling delay with plenty of idle CPUs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGETcx_nVKYMhCmC6BPNVxLfDaz=uoSsk1WOs-aX=M03Ew2qTA@mail.gmail.com>
Date: Wed, 13 Nov 2024 22:36:31 -0800
From: Saravana Kannan <saravanak@...gle.com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Ingo Molnar <mingo@...hat.com>, "Peter Zijlstra (Intel)" <peterz@...radead.org>, 
	Juri Lelli <juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	Steven Rostedt <rostedt@...dmis.org>, Benjamin Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>, LKML <linux-kernel@...r.kernel.org>, 
	wuyun.abel@...edance.com, youssefesmat@...omium.org, 
	Thomas Gleixner <tglx@...utronix.de>, efault@....de, 
	K Prateek Nayak <kprateek.nayak@....com>, John Stultz <jstultz@...gle.com>, 
	Vincent Palomares <paillon@...gle.com>
Subject: Re: Very high scheduling delay with plenty of idle CPUs

Ugh... just realized that for a few of the emails I've been replying
directly to one person instead of reply-all.

On Fri, Nov 8, 2024 at 1:02 AM Vincent Guittot
<vincent.guittot@...aro.org> wrote:
>
> On Fri, 8 Nov 2024 at 08:28, Saravana Kannan <saravanak@...gle.com> wrote:
> >
> > Hi scheduler folks,
> >
> > I'm running into some weird scheduling issues when testing non-sched
> > changes on a Pixel 6 that's running close to 6.12-rc5. I'm not sure if
> > this is an issue in earlier kernel versions or not.
> >
> > The async suspend/resume code calls async_schedule_dev_nocall() to
> > queue up a bunch of work. These queued up work seem to be running in
> > kworker threads.
> >
> > However, there have been many times where I see scheduling latency
> > (runnable, but not running) of 4.5 ms or higher for a kworker thread
> > when there are plenty of idle CPUs.
>
> You are using EAS, aren't you ?
> so the energy impact drive the cpu selection not cpu idleness
>
> There is a proposal to change feec to also take into account such case
> in addition to the energy impact
> https://lore.kernel.org/lkml/64ed0fb8-12ea-4452-9ec2-7ad127b65529@arm.com/T/
>
> I still have to finalize v2

Anyway, I tried this series (got it from
https://git.linaro.org/people/vincent.guittot/kernel.git/log/?h=sched/rework-eas)
and:
1. The timing hasn't improved at all compared to not having the series.
2. There's still a lot of preemption of runnable tasks with some empty CPUs.

For example:
https://ui.perfetto.dev/#!/?s=955ff7e73edf32dab27501025211fa2ce322f725

Thanks,
Saravana


>
> >
> > Does async_schedule_dev_nocall() have some weird limitations on where
> > they can be run? I know it has some NUMA related stuff, but the Pixel
> > 6 doesn't have NUMA. This oddity unnecessarily increases
> > suspend/resume latency as it adds up across kworker threads. So, I'd
> > appreciate any insights on what might be happening?
> >
> > If you know how to use perfetto (it's really pretty simple, all you
> > need to know is WASD and clicking), here's an example:
> > https://ui.perfetto.dev/#!/?s=e20045736e7dfa1e897db6489710061d2495be92
> >
> > Thanks,
> > Saravana