[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtBE0_77+J-A7vWRKsHCCmuX1jWTbPYWGVPg1MYq_rv8Og@mail.gmail.com>
Date: Thu, 26 Jun 2025 16:26:00 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Chris Mason <clm@...a.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Chris Mason <clm@...com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] sched/fair: bump sd->max_newidle_lb_cost when newidle
balance fails
On Thu, 26 Jun 2025 at 12:58, Chris Mason <clm@...a.com> wrote:
>
> On 6/26/25 3:00 AM, Peter Zijlstra wrote:
> > On Tue, Jun 24, 2025 at 01:48:08PM -0700, Chris Mason wrote:
>
> [ ... ]
>
> > For the non-RFC version, please split this into a code move and a code
> > change -- I had to stare waaay to long to spot the difference (if we
> > keep this code movement at all).
>
> Sure
>
> >
> >> /*
> >> * Check this_cpu to ensure it is balanced within domain. Attempt to move
> >> * tasks if there is an imbalance.
> >> @@ -11782,12 +11808,14 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
> >>
> >> group = sched_balance_find_src_group(&env);
> >> if (!group) {
> >> + update_newidle_cost(sd, sd->max_newidle_lb_cost + sd->max_newidle_lb_cost / 2);
> >> schedstat_inc(sd->lb_nobusyg[idle]);
> >> goto out_balanced;
> >> }
> >>
> >> busiest = sched_balance_find_src_rq(&env, group);
> >> if (!busiest) {
> >> + update_newidle_cost(sd, sd->max_newidle_lb_cost + sd->max_newidle_lb_cost / 2);
> >> schedstat_inc(sd->lb_nobusyq[idle]);
> >> goto out_balanced;
> >> }
> >
> > So sched_balance_rq() is used for pretty much all load-balancing, not
> > just newidle.
> >
> > Either make this conditional like:
> >
> > if (idle == CPU_NEWLY_IDLE)
> > update_newidle_cost(...);
> >
> > or do it all the callsite, where we find !pulled_task (ie failure).
> >
> > Specifically, we already do update_newidle_cost() there, perhaps inflate
> > the cost there instead?
> >
> > if (!pulled_tasks)
> > domain_cost += sysctl_sched_migration_cost;
>
> Got it, I'll play with that. Vincent, was there a benchmark I can use
> to see if I've regressed the case you were focused on?
It's not a public benchmark but I had some unitary tests with tasks
waiting on a busy CPU while other CPUs become idle for a "long" time
(but still less than 500us in average). This is even more true with
frequency scaling which will minimize the idle duration by decreasing
the frequency
>
> -chris
>
Powered by blists - more mailing lists