[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250626140259.GL1613200@noisy.programming.kicks-ass.net>
Date: Thu, 26 Jun 2025 16:02:59 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Chris Mason <clm@...a.com>
Cc: Jianyong Wu <jianyong.wu@...look.com>, Chris Mason <clm@...com>,
vincent.guittot@...aro.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] sched/fair: bump sd->max_newidle_lb_cost when
newidle balance fails
On Thu, Jun 26, 2025 at 09:48:29AM -0400, Chris Mason wrote:
> FWIW, the main reason I parked this in sched_balance_rq() was because I
> wasn't sure if there were other reasons we might fail to pull a task
> where I didn't want to bump the cost. You've got a good point though.
>
> This v2 performs the same. It takes about 60 seconds to warm up to the
> point where schbench performance is fast. Below I've also got
> the drgn script I'm using the dump the avg_idle and sd info along with
> output while schbench is running.
>
> Peter if this is what you're looking for I'll send a new version,
> without whatever whitespace mangling exchange does to this email.
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7a14da5396fb2..6a3345168870a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -12173,9 +12173,11 @@ static inline bool update_newidle_cost(struct sched_domain *sd, u64 cost)
> if (cost > sd->max_newidle_lb_cost) {
> /*
> * Track max cost of a domain to make sure to not delay the
> - * next wakeup on the CPU.
> + * next wakeup on the CPU. Also, cap this cost at the sched_migration
> + * sysctl, plus a little extra so we don't have to worry about off by 1s
It explains we cap (as can be read from the code); perhaps explain why
the cap is important though.
> */
> - sd->max_newidle_lb_cost = cost;
> + sd->max_newidle_lb_cost =
> + min(cost, sysctl_sched_migration_cost + 200);
> sd->last_decay_max_lb_cost = jiffies;
> } else if (time_after(jiffies, sd->last_decay_max_lb_cost + HZ)) {
> /*
> @@ -12867,7 +12869,13 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
>
> t1 = sched_clock_cpu(this_cpu);
> domain_cost = t1 - t0;
> - update_newidle_cost(sd, domain_cost);
> +
> + /* bump the newidle cost if balances aren't productive */
> + if (!pulled_task)
> + update_newidle_cost(sd,
> + sd->max_newidle_lb_cost + sd->max_newidle_lb_cost / 2);
> + else
> + update_newidle_cost(sd, domain_cost);
>
> curr_cost += domain_cost;
> t0 = t1;
I think I prefer the form:
domain_cost = t1 - t0;
curr_cost += domain_cost;
t0 = t1;
/*
* Failing newidle means it is not effective;
* bump the cost so we end up doing less of it.
*/
if (!pulled_task)
domain_cost = (3 * sd->max_newidle_lb_cost) / 2;
update_newidle_cost(sd, domain_cost);
But yes. Thanks!
Powered by blists - more mailing lists