[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKfTPtDtmtW7EKte9vwPUxYKfCGJuTDGudnz342p6sTDsk1qUg@mail.gmail.com>
Date: Thu, 14 Nov 2024 14:06:36 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Saravana Kannan <saravanak@...gle.com>
Cc: Ingo Molnar <mingo@...hat.com>, "Peter Zijlstra (Intel)" <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Benjamin Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>, LKML <linux-kernel@...r.kernel.org>,
wuyun.abel@...edance.com, youssefesmat@...omium.org,
Thomas Gleixner <tglx@...utronix.de>, efault@....de,
K Prateek Nayak <kprateek.nayak@....com>, John Stultz <jstultz@...gle.com>,
Vincent Palomares <paillon@...gle.com>
Subject: Re: Very high scheduling delay with plenty of idle CPUs
On Thu, 14 Nov 2024 at 07:37, Saravana Kannan <saravanak@...gle.com> wrote:
>
> Ugh... just realized that for a few of the emails I've been replying
> directly to one person instead of reply-all.
>
> On Fri, Nov 8, 2024 at 1:02 AM Vincent Guittot
> <vincent.guittot@...aro.org> wrote:
> >
> > On Fri, 8 Nov 2024 at 08:28, Saravana Kannan <saravanak@...gle.com> wrote:
> > >
> > > Hi scheduler folks,
> > >
> > > I'm running into some weird scheduling issues when testing non-sched
> > > changes on a Pixel 6 that's running close to 6.12-rc5. I'm not sure if
> > > this is an issue in earlier kernel versions or not.
> > >
> > > The async suspend/resume code calls async_schedule_dev_nocall() to
> > > queue up a bunch of work. These queued up work seem to be running in
> > > kworker threads.
> > >
> > > However, there have been many times where I see scheduling latency
> > > (runnable, but not running) of 4.5 ms or higher for a kworker thread
> > > when there are plenty of idle CPUs.
> >
> > You are using EAS, aren't you ?
> > so the energy impact drive the cpu selection not cpu idleness
> >
> > There is a proposal to change feec to also take into account such case
> > in addition to the energy impact
> > https://lore.kernel.org/lkml/64ed0fb8-12ea-4452-9ec2-7ad127b65529@arm.com/T/
> >
> > I still have to finalize v2
>
> Anyway, I tried this series (got it from
> https://git.linaro.org/people/vincent.guittot/kernel.git/log/?h=sched/rework-eas)
> and:
> 1. The timing hasn't improved at all compared to not having the series.
Surprising As I can see improvements on rb5 with unbounded kworker
spreads on little cpus unlike current implementation but the use of
med and big cores waitq for little to be filled 1st which is not not
case when disable eas
> 2. There's still a lot of preemption of runnable tasks with some empty CPUs.
Yes, little are fully filled but med and big are used later when
utilization of little have increased
>
> For example:
> https://ui.perfetto.dev/#!/?s=955ff7e73edf32dab27501025211fa2ce322f725
>
> Thanks,
> Saravana
>
>
> >
> > >
> > > Does async_schedule_dev_nocall() have some weird limitations on where
> > > they can be run? I know it has some NUMA related stuff, but the Pixel
> > > 6 doesn't have NUMA. This oddity unnecessarily increases
> > > suspend/resume latency as it adds up across kworker threads. So, I'd
> > > appreciate any insights on what might be happening?
> > >
> > > If you know how to use perfetto (it's really pretty simple, all you
> > > need to know is WASD and clicking), here's an example:
> > > https://ui.perfetto.dev/#!/?s=e20045736e7dfa1e897db6489710061d2495be92
> > >
> > > Thanks,
> > > Saravana
Powered by blists - more mailing lists