linux-kernel - Re: [PATCH v2] sched/fair: bump sd->max_newidle_lb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5e9789ae-9a2d-4c95-a1c3-db489d132559@intel.com>
Date: Wed, 16 Jul 2025 23:56:30 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: Chris Mason <clm@...a.com>, Chris Mason <clm@...com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
	<aubrey.li@...ux.intel.com>, <peterz@...radead.org>,
	<vincent.guittot@...aro.org>, kernel test robot <oliver.sang@...el.com>
Subject: Re: [PATCH v2] sched/fair: bump sd->max_newidle_lb_cost when newidle
 balance fails

On 7/15/2025 11:38 PM, Chris Mason wrote:
> On 7/15/25 6:08 AM, Chen, Yu C wrote:
>> On 7/15/2025 3:08 PM, kernel test robot wrote:
>>>
>>>
>>> Hello,
>>>
>>> kernel test robot noticed a 22.9% regression of unixbench.throughput on:
>>>
>>>
>>> commit: ac34cb39e8aea9915ec2f4e08c979eb2ed1d7561 ("[PATCH v2] sched/
>>> fair: bump sd->max_newidle_lb_cost when newidle balance fails")
>>> url: https://github.com/intel-lab-lkp/linux/commits/Chris-Mason/sched-
>>> fair-bump-sd-max_newidle_lb_cost-when-newidle-balance-
>>> fails/20250626-224805 base: https://git.kernel.org/cgit/linux/kernel/
>>> git/tip/tip.git  5bc34be478d09c4d16009e665e020ad0fcd0deea
>>> patch link: https://lore.kernel.org/all/20250626144017.1510594-2-
>>> clm@...com/ patch subject: [PATCH v2] sched/fair: bump sd-
>>>> max_newidle_lb_cost when newidle balance fails
> 
> [ ... ]
> 
>>>
>>> commit:
>>>     5bc34be478 ("sched/core: Reorganize cgroup bandwidth control
>>> interface file writes")
>>>     ac34cb39e8 ("sched/fair: bump sd->max_newidle_lb_cost when newidle
>>> balance fails")
>>>
>>> 5bc34be478d09c4d ac34cb39e8aea9915ec2f4e08c9
>>> ---------------- ---------------------------
>>>            %stddev     %change         %stddev
>>>                \          |                \
>> ...
>>
>>>        40.37           +16.9       57.24        mpstat.cpu.all.idle%
>>
>> This commit inhibits the newidle balance. It seems that some workloads
>> do not like newlyidle balance, like schbench, which is short duration
>> task. While other workloads want the newidle balance to pull at its best
>> effort, like unixbench shell test case.
>> Just wonder if we can check the sched domain's average utilization to
>> decide how hard we should trigger the newly idle balance, or can we check
>> the overutilized flag to decide whether we should launch the
>> new idle balance, something I was thinking of:
> 
> Thanks for looking at this.
> 
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 9e24038fa000..6c7420ed484e 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -13759,7 +13759,8 @@ static int sched_balance_newidle(struct rq
>> *this_rq, struct rq_flags *rf)
>>          sd = rcu_dereference_check_sched_domain(this_rq->sd);
>>
>>          if (!get_rd_overloaded(this_rq->rd) ||
>> -           (sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) {
>> +           (sd && this_rq->avg_idle < sd->max_newidle_lb_cost &&
>> +            !READ_ONCE(this_rq->rd->overutilized))) {
>>
>>                  if (sd)
>>                          update_next_balance(sd, &next_balance);
>>
> 
> 
> Looking at rd->overutilized, I think we only set it when
> sched_energy_enabled().  I'm not sure if that's true often enough to use
> as a fix for hackbench?
> 

OK, overutilized is only used for EAS.

I just had a try but can not reproduce this issue on a 240 CPUs system
using unixbench:
./Run shell1 -i 30 -c  240
will need to double check with lkp/0day to figure out.

thanks,
Chenyu


> -chris
>