linux-kernel - Re: [PATCH RFC] sched/fair: bump sd->max_newidle_lb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250626140259.GL1613200@noisy.programming.kicks-ass.net>
Date: Thu, 26 Jun 2025 16:02:59 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Chris Mason <clm@...a.com>
Cc: Jianyong Wu <jianyong.wu@...look.com>, Chris Mason <clm@...com>,
	vincent.guittot@...aro.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] sched/fair: bump sd->max_newidle_lb_cost when
 newidle balance fails

On Thu, Jun 26, 2025 at 09:48:29AM -0400, Chris Mason wrote:

> FWIW, the main reason I parked this in sched_balance_rq() was because I
> wasn't sure if there were other reasons we might fail to pull a task
> where I didn't want to bump the cost.  You've got a good point though.
> 
> This v2 performs the same.  It takes about 60 seconds to warm up to the
> point where schbench performance is fast.  Below I've also got
> the drgn script I'm using the dump the avg_idle and sd info along with
> output while schbench is running.
> 
> Peter if this is what you're looking for I'll send a new version,
> without whatever whitespace mangling exchange does to this email.
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7a14da5396fb2..6a3345168870a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -12173,9 +12173,11 @@ static inline bool update_newidle_cost(struct sched_domain *sd, u64 cost)
>         if (cost > sd->max_newidle_lb_cost) {
>                 /*
>                  * Track max cost of a domain to make sure to not delay the
> -                * next wakeup on the CPU.
> +                * next wakeup on the CPU.  Also, cap this cost at the sched_migration
> +                * sysctl, plus a little extra so we don't have to worry about off by 1s

It explains we cap (as can be read from the code); perhaps explain why
the cap is important though.

>                  */
> -               sd->max_newidle_lb_cost = cost;
> +               sd->max_newidle_lb_cost =
> +                       min(cost, sysctl_sched_migration_cost + 200);
>                 sd->last_decay_max_lb_cost = jiffies;
>         } else if (time_after(jiffies, sd->last_decay_max_lb_cost + HZ)) {
>                 /*
> @@ -12867,7 +12869,13 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
> 
>                         t1 = sched_clock_cpu(this_cpu);
>                         domain_cost = t1 - t0;
> -                       update_newidle_cost(sd, domain_cost);
> +
> +                       /* bump the newidle cost if balances aren't productive */
> +                       if (!pulled_task)
> +                               update_newidle_cost(sd,
> +                               sd->max_newidle_lb_cost + sd->max_newidle_lb_cost / 2);
> +                       else
> +                               update_newidle_cost(sd, domain_cost);
> 
>                         curr_cost += domain_cost;
>                         t0 = t1;

I think I prefer the form:

			domain_cost = t1 - t0;
			curr_cost += domain_cost;
			t0 = t1;

			/*
			 * Failing newidle means it is not effective;
			 * bump the cost so we end up doing less of it.
			 */
			if (!pulled_task)
				domain_cost = (3 * sd->max_newidle_lb_cost) / 2;

			update_newidle_cost(sd, domain_cost);

But yes. Thanks!